Transport stream for carriage of video coding extensions

ABSTRACT

A video processing device may obtain, from a descriptor for a program comprising one or more elementary streams, a plurality of profile, tier, level (PTL) syntax element sets. The video processing device may obtain, from the descriptor, a plurality of operation point syntax element sets. For each respective operation point syntax element set of the plurality of operation point syntax element sets, the video processing device may determine, for each respective layer of the respective operation point specified by the respective operation point syntax element set, based on a respective syntax element in the respective operation point syntax element set, which of the PTL syntax element sets specifies the PTL information assigned to the respective layer, the respective operation point having a plurality of layers.

This application claims the benefit of U.S. Provisional PatentApplication No. 62/025,432, filed Jul. 16, 2014, the entire content ofwhich is incorporated by reference.

TECHNICAL FIELD

This disclosure relates to video processing.

BACKGROUND

Digital video capabilities can be incorporated into a wide range ofdevices, including digital televisions, digital direct broadcastsystems, wireless broadcast systems, personal digital assistants (PDAs),laptop or desktop computers, tablet computers, e-book readers, digitalcameras, digital recording devices, digital media players, video gamingdevices, video game consoles, cellular or satellite radio telephones,so-called “smart phones,” video teleconferencing devices, videostreaming devices, and the like. Digital video devices implement videocompression techniques, such as those described in the standards definedby MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, AdvancedVideo Coding (AVC), the High Efficiency Video Coding (HEVC) standard,and extensions of such standards. The video devices may transmit,receive, encode, decode, and/or store digital video information moreefficiently by implementing such video compression techniques.

Video compression techniques perform spatial (intra-picture) predictionand/or temporal (inter-picture) prediction to reduce or removeredundancy inherent in video sequences. For block-based video coding, avideo slice (i.e., a video frame or a portion of a video frame) may bepartitioned into video blocks. Video blocks in an intra-coded (I) sliceof a picture are encoded using spatial prediction with respect toreference samples in neighboring blocks in the same picture. Videoblocks in an inter-coded (P or B) slice of a picture may use spatialprediction with respect to reference samples in neighboring blocks inthe same picture or temporal prediction with respect to referencesamples in other reference pictures. Pictures may be referred to asframes, and reference pictures may be referred to as reference frames.

Spatial or temporal prediction results in a predictive block for a blockto be coded. Residual data represents pixel differences between theoriginal block to be coded and the predictive block. An inter-codedblock is encoded according to a motion vector that points to a block ofreference samples forming the predictive block, and the residual dataindicates the difference between the coded block and the predictiveblock. An intra-coded block is encoded according to an intra-coding modeand the residual data. For further compression, the residual data may betransformed from the pixel domain to a transform domain, resulting inresidual coefficients, which then may be quantized. The quantizedcoefficients, initially arranged in a two-dimensional array, may bescanned in order to produce a one-dimensional vector of coefficients,and entropy coding may be applied to achieve even more compression.

A multiview coding bitstream may be generated by encoding views, e.g.,from multiple perspectives. Some three-dimensional (3D) video standardshave been developed that make use of multiview coding aspects. Forexample, different views may transmit left and right eye views tosupport 3D video. Alternatively, some 3D video coding processes mayapply so-called multiview plus depth coding. In multiview plus depthcoding, a 3D video bitstream may contain not only texture viewcomponents, but also depth view components. For example, each view maycomprise one texture view component and one depth view component.

SUMMARY

Techniques of this disclosure include techniques related to the MPEG-2streams for carriage of multi-layer video data. For instance, particulartechniques of this disclosure relate to MPEG-2 transport streams forcarriage of High Efficiency Video Coding (HEVC) extensions, such asMulti-View HEVC (MV-HEVC), 3-dimensional HEVC (3D-HEVC), and ScalableHEVC (SHVC). In accordance with some techniques of this disclosure, adescriptor that includes syntax elements indicating layer indices ofprogram elements that need to be present in decoding order beforedecoding a current program element also includes an indication ofwhether the current program element enhances a frame rate of abitstream. In accordance with one or more additional techniques of thisdisclosure, a descriptor for a program includes syntax elementsspecifying sets of profile, tier, level (PTL) information and alsoincludes syntax elements indicating which of the sets of PTL informationapply to particular layers of operation points.

In one example, this disclosure describes a method of processing videodata, the method comprising: determining, based on syntax elements in adescriptor corresponding to a current program element, program elementsthat need to be accessed and be present in decoding order beforedecoding the current program element, the descriptor being in atransport stream; and determining, based on an indication in thedescriptor corresponding to the current program element, whether thecurrent program element enhances the frame rate of a bitstream, thebitstream resulting from a set of one or more program elements that needto be accessed and be present in decoding order before decoding thecurrent program element.

In another example, this disclosure describes a method of processingvideo data, the method comprising: determining whether a current programelement enhances a frame rate of a bitstream, the descriptor being in atransport stream; including, in a descriptor corresponding to thecurrent program element, syntax elements indicating layer indices ofprogram elements that need to be accessed and be present in decodingorder before decoding the current program element; and including, in thedescriptor corresponding to the current program element, an indicationof whether the current program element enhances the frame rate of thebitstream.

In another example, this disclosure describes a device for processingvideo data, the device comprising: one or more data storage mediaconfigured to store encoded video data; and one or more processorsconfigured to: determine, based on syntax elements in a descriptorcorresponding to a current program element comprising the encoded videodata, program elements that need to be accessed and be present indecoding order before decoding the current program element, thedescriptor being in a transport stream; and determine, based on anindication in the descriptor corresponding to the current programelement, whether the current program element enhances the frame rate ofa bitstream, the bitstream resulting from a set of one or more programelements that need to be accessed and be present in decoding orderbefore decoding the current program element.

In another example, this disclosure describes a device for processingvideo data, the device comprising: a data storage medium configured tostore encoded video data; and one or more processors configured to:determine whether a current program element comprising the encoded videodata enhances a frame rate of a bitstream, the descriptor being in atransport stream; include, in a descriptor corresponding to the currentprogram element, syntax elements indicating layer indices of programelements that need to be accessed and be present in decoding orderbefore decoding the current program element; and include, in thedescriptor corresponding to the current program element, an indicationof whether the current program element enhances the frame rate of thebitstream.

In another example, this disclosure describes a device for processingvideo data, the device comprising: means for determining, based onsyntax elements in a descriptor corresponding to a current programelement, program elements that need to be accessed and be present indecoding order before decoding the current program element, thedescriptor being in a transport stream; and means for determining, basedon an indication in the descriptor corresponding to the current programelement, whether the current program element enhances the frame rate ofa bitstream, the bitstream resulting from a set of one or more programelements that need to be accessed and be present in decoding orderbefore decoding the current program element.

In another example, this disclosure describes a computer-readable datastorage medium having instructions stored thereon that, when executed,cause one or more processors to: determine, based on syntax elements ina descriptor corresponding to a current program element, programelements that need to be accessed and be present in decoding orderbefore decoding the current program element, the descriptor being in atransport stream; and determine, based on an indication in thedescriptor corresponding to the current program element, whether thecurrent program element enhances the frame rate of a bitstream, thebitstream resulting from a set of one or more program elements that needto be accessed and be present in decoding order before decoding thecurrent program element.

In another example, this disclosure describes a computer-readable datastorage medium having instructions stored thereon that, when executed,cause one or more processors to: determine whether a current programelement enhances a frame rate of a bitstream; include, in a descriptorcorresponding to the current program element, syntax elements indicatinglayer indices of program elements that need to be accessed and bepresent in decoding order before decoding the current program element,the descriptor being in a transport stream; and include, in thedescriptor corresponding to the current program element, an indicationof whether the current program element enhances the frame rate of thebitstream.

The details of one or more examples of the disclosure are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages will be apparent from the description, drawings,and claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example video coding systemthat may utilize the techniques described in this disclosure.

FIG. 2 is a block diagram illustrating an example video encoder.

FIG. 3 is a block diagram illustrating an example video decoder.

FIG. 4A is a flowchart illustrating a first example operation to processvideo data, in accordance with a technique of this disclosure.

FIG. 4B is a flowchart illustrating a second example operation toprocess video data, in accordance with a technique of this disclosure.

FIG. 5A is a flowchart illustrating a third example operation to processvideo data, in accordance with a technique of this disclosure.

FIG. 5B is a flowchart illustrating a fourth example operation toprocess video data, in accordance with a technique of this disclosure.

DETAILED DESCRIPTION

High Efficiency Video Coding (HEVC) is a recently standardized videocoding standard. Multi-layer HEVC is a term referring to extensions ofHEVC supporting multiple layers. Multi-view HEVC (MV-HEVC),3-dimensional HEVC (3D-HEVC), and Scalable HEVC (SHVC) are example typesof multi-layer HEVC. In MV-HEVC and 3D-HEVC, different layers maycorrespond to different views. SHVC provides for a base layer andenhancement layers. The enhancement layers may provide enhancements tothe frame rate or picture quality of the base layer.

Some pictures within a layer may be decoded without reference to otherpictures within the same layer. Thus, network abstraction layer (NAL)units encapsulating data of certain pictures of a layer may be removedfrom the bitstream without affecting the decodability of other picturesin the layer. Removing NAL units encapsulating data of such pictures mayreduce the frame rate of the bitstream. A subset of pictures within alayer that may be decoded without reference to other pictures within thelayer may be referred to herein as a “sub-layer” or a “temporalsub-layer.”

The MPEG-2 Systems specification describes how compressed multimedia(video and audio) data streams may be multiplexed together with otherdata to form a single data stream suitable for digital transmission orstorage. HEVC and multi-layer HEVC are example types of video data thatmay be multiplexed to form a data stream in the MPEG-2 Systemsspecification. The MPEG-2 Systems specification defines the concepts ofa program stream and a transport stream. Program streams are biased forthe storage and display of a single program from a digital storageservice. In general, a program stream is intended for use in error-freeenvironments. In contrast, transport streams are intended for thesimultaneous delivery of multiple programs over potentially error-pronechannels. Program streams and transport streams include packetizedelementary stream (PES) packets. The PES packets of program streams andtransport streams belong to one or more elementary streams. Anelementary stream is a single, digitally coded (possibly HEVC- ormulti-layer HEVC-compressed) component of a program. For example, thecoded video or audio part of the program can be an elementary stream.

A transport stream may include one or more descriptors that conveyfurther information about a program or elementary streams of a program.For instance, descriptors may include video encoding parameters, audioencoding parameters, language identification information, pan-and-scaninformation, conditional access details, copyright information, and soon. A broadcaster or other user may define additional privatedescriptors, if required. In video related component elementary streams,the descriptors may include one or more hierarchy descriptors. Thehierarchy descriptor provides information identifying the programelements containing components of hierarchically-coded video, audio, andprivate streams. The private streams may include metadata, such as astream of program specific information. In general, a program element isone of the data or elementary streams included in a program (i.e., acomponent elementary stream of the program). In MPEG-2 transportstreams, program elements are usually packetized. In MPEG-2 programstreams, the program elements are not packetized.

The descriptors are separate from the encoded video data. Thus, adevice, such as a Media Aware Network Element (MANE), may be able to usea descriptor to perform various functions on transport streams andprogram streams without decoding or otherwise analyzing encoded videodata. For instance, if the video data is encoded using HEVC, the devicedoes not need to be configured to decode HEVC-encoded video data inorder to use the descriptor to perform particular functions on transportor program streams. For instance, the device may be able to use thedescriptors as part of a process to determine whether to forwardparticular program elements to a destination device.

Each respective temporal sub-layer of each respective layer of a programmay correspond to a different program component (e.g., elementarystream) of the program. As indicated above, the descriptors may includehierarchy descriptors. Each respective hierarchy descriptor providesinformation regarding a corresponding program component, and hence arespective temporal sub-layer. For instance, a hierarchy descriptor mayinclude a syntax element specifying an embedded temporal sub-layerneeded to decode the temporal sub-layer corresponding to the hierarchydescriptor. Furthermore, the hierarchy descriptor may include syntaxelements specifying whether the corresponding temporal sub-layerprovides temporal scalability (e.g., increases the frame rate) relativeto the embedded temporal sub-layer, provides spatial scalability (e.g.,increases picture resolution) relative to the embedded temporalsub-layer, provides quality scalability (e.g., enhances signal-to-noisequality or fidelity) relative to the embedded temporal sub-layer, and soon. A hierarchy descriptor does not indicate whether decoding thecorresponding temporal sub-layer is dependent on decoding programcomponents corresponding to different layers.

In addition to the one or more hierarchy descriptors, the descriptorssignaled in a MPEG-2 transport or program stream may include one or morehierarchy extension descriptors. Each hierarchy extension descriptor mayprovide additional information regarding a corresponding programcomponent, and hence a respective temporal sub-layer. Unlike a hierarchydescriptor, a hierarchy extension descriptor may indicate which layersare required to be decoded to successfully decode the temporal sub-layercorresponding to the hierarchy extension descriptor.

A hierarchy extension descriptor does not identify which, if any,temporal sub-layer is needed to decode the temporal sub-layercorresponding to the hierarchy extension descriptor. In other words, ahierarchy extension descriptor cannot describe temporal dependency.Thus, hierarchy descriptors are used to describe only temporaldependency, whereas other types of dependency are described usinghierarchy extension descriptors. As a result, interpretation ofhierarchy extension descriptors is dependent on hierarchy descriptors.In other words, a device may not be able to fully determine which otherprogram components are required to be decoded in order to decode theprogram component corresponding to a hierarchy extension descriptor.Thus, a hierarchy extension descriptor may not be used without theexistence of a corresponding hierarchy descriptor.

Particular techniques of this disclosure may break the dependency ofhierarchy extension descriptors on hierarchy descriptors. Thus, inaccordance with a technique of this disclosure, a device may use ahierarchy extension descriptor without the existence of a correspondinghierarchy descriptor. For example, a computing device may determinewhether a current program element enhances (e.g., increases) a framerate of a bitstream. In this example, the computing device may include,in a descriptor corresponding to the current program element (e.g., ahierarchy extension descriptor), syntax elements indicating layerindices of program elements that need to be accessed and be present indecoding order, before decoding the current program element. Thedescriptor may be in a transport stream. In other examples, thedescriptor is in a program stream or elsewhere. In this example, thecomputing device includes, in the descriptor corresponding to thecurrent program element, an indication of whether the current programelement enhances the frame rate of the bitstream.

In addition to hierarchy descriptors and hierarchy extensiondescriptors, the descriptors of a transport or program stream mayinclude HEVC operation point descriptors and HEVC extension descriptors.An HEVC operation point descriptor includes information describing anoperation point. An operation point is a subset of NAL units of abitstream. An operation point may be defined by a set of layeridentifiers and a maximum temporal identifier. In some instances, anoperation point consists of each NAL unit of a bitstream belonging toone of the identified layers and having a temporal identifier less thanor equal to the maximum temporal identifier.

Both HEVC operation point descriptors and HEVC extension descriptorsinclude syntax elements indicating profile, tier, and level (PTL)information. In general, a “profile” of a video coding standard is asubset of the features and tools present in the video coding standard.In other words, a profile defines what coding tools may be used. Forinstance, for a video encoder, a profile may be a set of coding toolsthat the video encoder can use to generate coded bitstreams conformingto said profile. For a video decoder, a profile may mean the set ofcoding tools the video decoder must have in order to be able to decodebitstreams said to conform to the profile.

A level is a defined set of constraints on the values that may be takenby the syntax elements and variables of a video coding standard. A tieris a specified category of level constraints imposed on values of thesyntax elements in the bitstream or values of variables, where the levelconstraints are nested within a tier and a decoder conforming to acertain tier and level would be capable of decoding all bitstreamsconforming to the same tier or the lower tier of that level or any levelbelow that level. Thus, a level of a tier is a specified set ofconstraints imposed on values of the syntax elements in the bitstream orvariables used in decoding the bitstream.

As indicated above, both HEVC operation point descriptors and HEVCextension descriptors include syntax elements indicating PTLinformation. However, the signaling of PTL information in HEVC operationpoint descriptors and HEVC extension descriptors is not aligned with howPTL information is signaled at the codec level, e.g., in SHVC andMV-HEVC. For instance, at the codec level, each layer included in anoperation point is assigned with its own PTL information. However, thisis not the case in HEVC operation point descriptors and HEVC extensiondescriptors.

Additional techniques of this disclosure may align the signaling of PTLinformation in such descriptors with the signaling of PTL information atthe codec level. For instance, particular techniques of this disclosuremay specify, in a descriptor corresponding to a program (e.g., a HEVCextension descriptor), PTL information for each respective layer of aset of operation points of the program. In one example, a computingdevice signals, in a descriptor for a program comprising one or moreelementary streams, a plurality of PTL syntax element sets. Thedescriptor may be in a transport stream. In this example, for eachrespective layer of each respective operation point of a plurality ofoperation points of the program, the computing device or another devicemay assign respective PTL information to the respective layer of therespective operation point. Furthermore, in this example, the computingdevice signals, in the descriptor for the program, a plurality ofoperation point syntax element sets. In this example, each respectiveoperation point syntax element set of the plurality of operation pointsyntax element sets specifies a respective operation point of theplurality of operation points. In this example, for each respectivelayer of the respective operation point, the respective operation pointsyntax element includes a respective syntax element identifying arespective PTL syntax element set of the plurality of PTL syntax elementsets specifying the respective PTL information assigned to therespective layer of the respective operation point.

FIG. 1 is a block diagram illustrating an example video coding system 10that may utilize the techniques of this disclosure. As used herein, theterm “video coder” refers generically to both video encoders and videodecoders. In this disclosure, the terms “video coding” or “coding” mayrefer generically to video encoding or video decoding.

As shown in FIG. 1, video coding system 10 includes a source device 12and a destination device 14. Source device 12 generates encoded videodata. Accordingly, source device 12 may be referred to as a videoencoding device or a video encoding apparatus. Destination device 14 maydecode the encoded video data generated by source device 12.Accordingly, destination device 14 may be referred to as a videodecoding device or a video decoding apparatus. Source device 12 anddestination device 14 may be examples of video coding devices or videocoding apparatuses.

Source device 12 and destination device 14 may comprise a wide range ofdevices, including desktop computers, mobile computing devices, notebook(e.g., laptop) computers, tablet computers, set-top boxes, telephonehandsets such as so-called “smart” phones, televisions, cameras, displaydevices, digital media players, video gaming consoles, in-car computers,video conferencing equipment, or the like.

Destination device 14 may receive encoded video data from source device12 via a channel 16. Channel 16 may comprise one or more media ordevices capable of moving the encoded video data from source device 12to destination device 14. In one example, channel 16 may comprise one ormore communication media that enable source device 12 to transmitencoded video data directly to destination device 14 in real-time. Inthis example, source device 12 may modulate the encoded video dataaccording to a communication standard, such as a wireless communicationprotocol, and may transmit the modulated video data to destinationdevice 14. The one or more communication media may include wirelessand/or wired communication media, such as a radio frequency (RF)spectrum or one or more physical transmission lines. The one or morecommunication media may form part of a packet-based network, such as alocal area network, a wide-area network, or a global network (e.g., theInternet). The one or more communication media may include routers,switches, base stations, or other equipment that facilitatecommunication from source device 12 to destination device 14.

In another example, channel 16 may include a storage medium that storesencoded video data generated by source device 12. In this example,destination device 14 may access the storage medium, e.g., via diskaccess or card access. The storage medium may include a variety oflocally-accessed data storage media such as Blu-ray discs, DVDs,CD-ROMs, flash memory, or other suitable digital storage media forstoring encoded video data.

In a further example, channel 16 may include a file server or anotherintermediate storage device that stores encoded video data generated bysource device 12. In this example, destination device 14 may accessencoded video data stored at the file server or other intermediatestorage device via streaming or download. The file server may be a typeof server capable of storing encoded video data and transmitting theencoded video data to destination device 14. Example file serversinclude web servers (e.g., for a website), file transfer protocol (FTP)servers, network attached storage (NAS) devices, and local disk drives.

Destination device 14 may access the encoded video data through astandard data connection, such as an Internet connection. Example typesof data connections may include wireless channels (e.g., Wi-Ficonnections), wired connections (e.g., digital subscriber line (DSL),cable modem, etc.), or combinations of both that are suitable foraccessing encoded video data stored on a file server. The transmissionof encoded video data from the file server may be a streamingtransmission, a download transmission, or a combination of both.

The techniques of this disclosure are not limited to wirelessapplications or settings. The techniques may be applied to video codingin support of a variety of multimedia applications, such as over-the-airtelevision broadcasts, cable television transmissions, satellitetelevision transmissions, streaming video transmissions, e.g., via theInternet, encoding of video data for storage on a data storage medium,decoding of video data stored on a data storage medium, or otherapplications. In some examples, video coding system 10 may be configuredto support one-way or two-way video transmission to support applicationssuch as video streaming, video playback, video broadcasting, and/orvideo telephony.

FIG. 1 is merely an example and the techniques of this disclosure mayapply to video coding settings (e.g., video encoding or video decoding)that do not necessarily include any data communication between theencoding and decoding devices. In other examples, data is retrieved froma local memory, streamed over a network, or the like. A video encodingdevice may encode and store data to memory, and/or a video decodingdevice may retrieve and decode data from memory. In many examples, theencoding and decoding is performed by devices that do not communicatewith one another, but simply encode data to memory and/or retrieve anddecode data from memory.

In the example of FIG. 1, source device 12 includes a video source 18, avideo encoder 20, and an output interface 22. In some examples, outputinterface 22 may include a modulator/demodulator (modem) and/or atransmitter. Video source 18 may include a video capture device, e.g., avideo camera, a video archive containing previously-captured video data,a video feed interface to receive video data from a video contentprovider, and/or a computer graphics system for generating video data,or a combination of such sources of video data.

Video encoder 20 may encode video data from video source 18. In someexamples, source device 12 directly transmits the encoded video data todestination device 14 via output interface 22. In other examples, theencoded video data may also be stored onto a storage medium or a fileserver for later access by destination device 14 for decoding and/orplayback.

In the example of FIG. 1, destination device 14 includes an inputinterface 28, a video decoder 30, and a display device 32. In someexamples, input interface 28 includes a receiver and/or a modem. Inputinterface 28 may receive encoded video data over channel 16. Videodecoder 30 may decode encoded video data. Display device 32 may displaythe decoded video data. Display device 32 may be integrated with or maybe external to destination device 14. Display device 32 may comprise avariety of display devices, such as a liquid crystal display (LCD), aplasma display, an organic light emitting diode (OLED) display, oranother type of display device.

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable circuitry, such as one or more microprocessors,digital signal processors (DSPs), application-specific integratedcircuits (ASICs), field-programmable gate arrays (FPGAs), discretelogic, hardware, or any combinations thereof. If the techniques areimplemented partially in software, a device may store instructions forthe software in a suitable, non-transitory computer-readable storagemedium and may execute the instructions in hardware using one or moreprocessors to perform the techniques of this disclosure. Any of theforegoing (including hardware, software, a combination of hardware andsoftware, etc.) may be considered to be one or more processors. Each ofvideo encoder 20 and video decoder 30 may be included in one or moreencoders or decoders, either of which may be integrated as part of acombined encoder/decoder (CODEC) in a respective device.

This disclosure may generally refer to video encoder 20 “signaling”certain information to another device, such as video decoder 30. Theterm “signaling” may generally refer to the communication of values forsyntax elements and/or other data used to decode the compressed videodata. Such communication may occur in real- or near-real-time.Alternately, such communication may occur over a span of time, such asmight occur when storing syntax elements to a computer-readable storagemedium in an encoded bitstream at the time of encoding, which then maybe retrieved by a decoding device at any time after being stored to thismedium.

In some examples, video encoder 20 and video decoder 30 operateaccording to a video compression standard, such as InternationalOrganization for Standardization (ISO)/IEC MPEG-4 Visual and ITU-T H.264(also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding(SVC) extension, Multiview Video Coding (MVC) extension, and MVC-basedthree-dimensional video (3DV) extension. In some instances, anybitstream conforming to the MVC-based 3DV extension of H.264/AVC alwayscontains a sub-bitstream that is compliant to the MVC extension ofH.264/AVC. Furthermore, video encoder 20 and video decoder 30 mayoperate according to a 3DV coding extension to H.264/AVC (i.e.,AVC-based 3DV) that is currently under development. In other examples,video encoder 20 and video decoder 30 may operate according toInternational Telecommunication Union Telecommunication StandardizationSector (ITU-T) H.261, International Organization for Standardization(ISO)/International Electrotechnical Commission (IEC) Moving PictureExperts Group (MPEG)-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, andITU-T H.264, ISO/IEC Visual. In other words, video coding standardsinclude ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IECMPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (alsoknown as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC)and Multiview Video Coding (MVC) extensions.

In other examples, video encoder 20 and video decoder 30 may operateaccording to the High Efficiency Video Coding (HEVC) developed by theJoint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video CodingExperts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG).HEVC may be referred to as “Rec. ITU-T H.265|ISO/IEC 23008-2.” An HEVCdraft specification, referred to as HEVC WD hereinafter, is availablefromhttp://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1003-v1.zip.A version of the HEVC, referred to as “HEVC Version 1” hereinafter, isavailable fromhttps://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.265-201304-S!!PDF-E&type=items.A scalable extension to HEVC, named SHVC, is also being developed by theJCT-VC. A recent Working Draft (WD) of SHVC and referred to as SHVC WD3hereinafter, is available fromhttp://phenix.it-sudparis.eu/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1008-v3.zip.A recent working draft (WD) of the range extension of HEVC, is availablefromhttp://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1005-v3.zip.

Furthermore, video encoder 20 and video decoder 30 may operate accordingto scalable video coding, multi-view coding, and 3DV extensions for HEVCthat are currently under development. The scalable video codingextension of HEVC may be referred to as SHVC. The multiview extension toHEVC, namely MV-HEVC, is also being developed by the JCT-3V. A recentWorking Draft (WD) of MV-HEVC, referred to as MV-HEVC WD5 hereinafter,is available fromhttp://phenix.it-sudparis.eu/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1004-v6.zip.The 3DV extension of HEVC may be referred to as HEVC-based 3DV or3D-HEVC. A recent working draft (WD) of the 3D extension of HEVC, namely3D-HEVC is available fromhttp://phenix.int-evry.fr/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1001-v3.zip.

In HEVC and other video coding specifications, a video sequencetypically includes a series of pictures. Pictures may also be referredto as “frames.” A picture may include three sample arrays, denotedS_(L), S_(Cb), and S_(Cr). S_(L) is a two-dimensional array (i.e., ablock) of luma samples. S_(Cb) is a two-dimensional array of Cbchrominance samples. S_(Cr) is a two-dimensional array of Cr chrominancesamples. Chrominance samples may also be referred to herein as “chroma”samples. In other instances, a picture may be monochrome and may onlyinclude an array of luma samples.

To generate an encoded representation of a picture, video encoder 20 maygenerate a set of coding tree units (CTUs). Each of the CTUs maycomprise a coding tree block of luma samples, two corresponding codingtree blocks of chroma samples, and syntax structures used to code thesamples of the coding tree blocks. In monochrome pictures or pictureshaving three separate color planes, a CTU may comprise a single codingtree block and syntax structures used to code the samples of the codingtree block. A coding tree block may be an N×N block of samples. A CTUmay also be referred to as a “tree block” or a “largest coding unit”(LCU). The CTUs of HEVC may be broadly analogous to the macroblocks ofother standards, such as H.264/AVC. However, a CTU is not necessarilylimited to a particular size and may include one or more coding units(CUs). A slice may include an integer number of CTUs orderedconsecutively in a raster scan order.

To generate a coded CTU, video encoder 20 may recursively performquad-tree partitioning on the coding tree blocks of a CTU to divide thecoding tree blocks into coding blocks, hence the name “coding treeunits.” A coding block is an N×N block of samples. A CU may comprise acoding block of luma samples and two corresponding coding blocks ofchroma samples of a picture that has a luma sample array, a Cb samplearray, and a Cr sample array, and syntax structures used to code thesamples of the coding blocks. In monochrome pictures or pictures havingthree separate color planes, a CU may comprise a single coding block andsyntax structures used to code the samples of the coding block.

Video encoder 20 may partition a coding block of a CU into one or moreprediction blocks. A prediction block is a rectangular (i.e., square ornon-square) block of samples on which the same prediction is applied. Aprediction unit (PU) of a CU may comprise a prediction block of lumasamples, two corresponding prediction blocks of chroma samples, andsyntax structures used to predict the prediction blocks. In monochromepictures or pictures having three separate color planes, a PU maycomprise a single prediction block and syntax structures used to predictthe prediction block. Video encoder 20 may generate predictive luma, Cb,and Cr blocks for luma, Cb, and Cr prediction blocks of each PU of theCU.

Video encoder 20 may use intra prediction or inter prediction togenerate the predictive blocks for a PU. If video encoder 20 uses intraprediction to generate the predictive blocks of a PU, video encoder 20may generate the predictive blocks of the PU based on decoded samples ofthe picture associated with the PU. In this disclosure, the phrase“based on” may indicate “based at least in part on.” If video encoder 20uses inter prediction to generate the predictive blocks of a PU, videoencoder 20 may generate the predictive blocks of the PU based on decodedsamples of one or more pictures other than the picture associated withthe PU.

To support inter prediction, video encoder 20 may generate one or morereference picture lists. These reference picture lists may be referredto as RefPicList0 and RefPicList1. In some examples, video encoder 20may generate different reference picture lists for different pictures ordifferent slices of pictures. Hence, different PUs of different picturesand/or slices may be associated with different versions of RefPicList0and RefPicList1.

Furthermore, when video encoder 20 uses inter prediction to generate apredictive block of a PU, the video encoder may signal motioninformation for the PU. The motion information may include a referenceindex for the PU and a motion vector for the PU. The reference index forthe PU may indicate a position, within one of the reference picturelists associated with the PU, of a reference picture. The motion vectorfor the PU may indicate a spatial displacement between a predictionblock of the PU and a reference location in the reference picture. Videoencoder 20 may use samples of the reference picture associated with thereference location to generate a predictive block for the PU. Becausethe PU may be associated with two reference pictures, the PU may havetwo reference indexes and two motion vectors. Hence, a PU may have aRefPicList0 reference index and a RefPicList1 reference index. The PU'sRefPicList0 reference index indicates a reference picture in the PU'sversion of RefPicList0. The PU's RefPicList1 reference index indicates areference picture in the PU's version of RefPicList1. Similarly, the PUmay have a RefPicList0 motion vector and a RefPicList1 motion vector.The PU's RefPicList0 motion vector may indicate a reference location ina reference picture in the PU's version of RefPicList0. The PU'sRefPicList1 motion vector may indicate a reference location in areference picture in the PU's version of RefPicList1.

Video encoder 20 may signal a PU's reference indexes and motion vectorsin a bitstream. In other words, video encoder 20 may include, in thebitstream, data indicating the PU's reference indexes and motionvectors. Video decoder 30 may reconstruct the PU's versions ofRefPicList0 and/or RefPicList1 and may use the PU's reference indexesand motion vectors to determine one or more predictive blocks for thePU. Video decoder 30 may use the predictive blocks for the PU, alongwith residual data, to decode samples.

After video encoder 20 generates a predictive block for a PU of a CU,video encoder 20 may generate a residual block for the CU. Each sampleof a residual block of a CU may indicate a difference between a samplein one of a predictive block of a PU of the CU and a correspondingsample in one of the coding blocks of the CU. For example, video encoder20 may generate predictive luma blocks for one or more PUs of a CU,video encoder 20 may generate a luma residual block for the CU. Eachsample in the CU's luma residual block indicates a difference between aluma sample in one of the CU's predictive luma blocks and acorresponding sample in the CU's original luma coding block. Inaddition, video encoder 20 may generate a Cb residual block for the CU.Each sample in the CU's Cb residual block may indicate a differencebetween a Cb sample in one of the CU's predictive Cb blocks and acorresponding sample in the CU's original Cb coding block. Video encoder20 may also generate a Cr residual block for the CU. Each sample in theCU's Cr residual block may indicate a difference between a Cr sample inone of the CU's predictive Cr blocks and a corresponding sample in theCU's original Cr coding block.

Furthermore, video encoder 20 may use quad-tree partitioning todecompose the residual blocks (e.g., luma, Cb, and Cr residual blocks)of a CU into one or more transform blocks (e.g., luma, Cb, and Crtransform blocks). A transform block may be a rectangular (e.g., squareor non-square) block of samples on which the same transform is applied.A transform unit (TU) of a CU may comprise a transform block of lumasamples, two corresponding transform blocks of chroma samples, andsyntax structures used to transform the transform block samples. Thus,each TU of a CU may be associated with a luma transform block, a Cbtransform block, and a Cr transform block. The luma transform blockassociated with the TU may be a sub-block of the CU's luma residualblock. The Cb transform block may be a sub-block of the CU's Cb residualblock. The Cr transform block may be a sub-block of the CU's Cr residualblock. In monochrome pictures or pictures having three separate colorplanes, a TU may comprise a single transform block and syntax structuresused to transform the samples of the transform block.

Video encoder 20 may apply one or more transforms to a transform blockof a TU to generate a coefficient block for the TU. A coefficient blockmay be a two-dimensional array of transform coefficients. A transformcoefficient may be a scalar quantity. For example, video encoder 20 mayapply one or transforms to a luma transform block of a TU to generate aluma coefficient block for the TU. Video encoder 20 may apply one ormore transforms to a Cb transform block of a TU to generate a Cbcoefficient block for the TU. Video encoder 20 may apply one or moretransforms to a Cr transform block of a TU to generate a Cr coefficientblock for the TU.

After generating a coefficient block (e.g., a luma coefficient block, aCb coefficient block, or a Cr coefficient block), video encoder 20 mayquantize the coefficient block. Quantization generally refers to aprocess in which transform coefficients are quantized to possibly reducethe amount of data used to represent the transform coefficients,providing further compression. After video encoder 20 quantizes acoefficient block, video encoder 20 may entropy encode syntax elementsindicating the quantized transform coefficients. For example, videoencoder 20 may perform Context-Adaptive Binary Arithmetic Coding (CABAC)on the syntax elements indicating the quantized transform coefficients.

Video encoder 20 may output a bitstream that includes a sequence of bitsthat forms a representation of coded pictures and associated data. Theterm “bitstream” may be a collective term used to refer to either aNetwork Abstraction Layer (NAL) unit stream (e.g., a sequence of NALunits) or a byte stream (e.g., an encapsulation of a NAL unit streamcontaining start code prefixes and NAL units as specified by Annex B ofthe HEVC standard). A NAL unit is a syntax structure containing anindication of the type of data in the NAL unit and bytes containing thatdata in the form of a raw byte sequence payload (RBSP) interspersed asnecessary with emulation prevention bits. Each of the NAL units mayinclude a NAL unit header and may encapsulate an RBSP. The NAL unitheader may include a syntax element indicating a NAL unit type code. TheNAL unit type code specified by the NAL unit header of a NAL unitindicates the type of the NAL unit. A RBSP may be a syntax structurecontaining an integer number of bytes that is encapsulated within a NALunit. In some instances, an RBSP includes zero bits.

Different types of NAL units may encapsulate different types of RBSPs.For example, a first type of NAL unit may encapsulate an RBSP for apicture parameter set (PPS), a second type of NAL unit may encapsulatean RBSP for a coded slice, a third type of NAL unit may encapsulate anRBSP for SEI, and so on. NAL units that encapsulate RBSPs for videocoding data (as opposed to RBSPs for parameter sets and SEI messages)may be referred to as video coding layer (VCL) NAL units. NAL units thatcontain parameter sets (e.g., VPSs, SPSs, PPSs, etc.) may be referred toas parameter set NAL units.

Video decoder 30 may receive a bitstream generated by video encoder 20.In addition, video decoder 30 may parse the bitstream to obtain syntaxelements from the bitstream. Video decoder 30 may reconstruct thepictures of the video data based at least in part on the syntax elementsobtained from the bitstream. The process to reconstruct the video datamay be generally reciprocal to the process performed by video encoder20. For instance, video decoder 30 may use motion vectors of PUs todetermine predictive blocks for the PUs of a current CU. In addition,video decoder 30 may inverse quantize coefficient blocks associated withTUs of the current CU. Video decoder 30 may perform inverse transformson the coefficient blocks to reconstruct transform blocks associatedwith the TUs of the current CU. Video decoder 30 may reconstruct thecoding blocks of the current CU by adding the samples of the predictiveblocks for PUs of the current CU to corresponding samples of thetransform blocks of the TUs of the current CU. By reconstructing thecoding blocks for each CU of a picture, video decoder 30 may reconstructthe picture.

In multi-view coding, there may be multiple views of the same scene fromdifferent viewpoints. In the context of multi-view coding, the term“access unit” may be used to refer to the set of pictures thatcorrespond to the same time instance. Thus, video data may beconceptualized as a series of access units occurring over time. A “viewcomponent” may be a coded representation of a view in a single accessunit. In this disclosure, a “view” may refer to a sequence of viewcomponents associated with the same view identifier. In some examples, aview component may be a texture view component (i.e., a texture picture)or a depth view component (i.e., a depth picture).

Multi-view coding supports inter-view prediction. Inter-view predictionis similar to the inter prediction used in HEVC and may use the samesyntax elements. However, when a video coder performs inter-viewprediction on a current video unit (such as a PU), video encoder 20 mayuse, as a reference picture, a picture that is in the same access unitas the current video unit, but in a different view. In contrast,conventional inter prediction only uses pictures in different accessunits as reference pictures.

In multi-view coding, a view may be referred to as a “base view” if avideo decoder (e.g., video decoder 30) can decode pictures in the viewwithout reference to pictures in any other view. When coding a picturein one of the non-base views, a video coder (such as video encoder 20 orvideo decoder 30) may add a picture into a reference picture list if thepicture is in a different view but within a same time instance (i.e.,access unit) as the picture that the video coder is currently coding.Like other inter prediction reference pictures, the video coder mayinsert an inter-view prediction reference picture at any position of areference picture list.

For instance, NAL units may include headers (i.e., NAL unit headers) andpayloads (e.g., RBSPs). The NAL unit headers may includenuh_reserved_zero_(—)6bits syntax elements. NAL units havingnuh_layer_id syntax elements, which may also be namednuh_reserved_zero_(—)6 bit syntax elements, specifying different valuesbelong to different “layers” of a bitstream. Thus, in multi-view coding,3DV, or SVC, the nuh_layer_id syntax element of the NAL unit specifies alayer identifier (i.e., a layer ID) of the NAL unit. The nuh_layer_idsyntax element of a NAL unit is equal to 0 if the NAL unit relates to abase layer in multi-view coding, 3DV coding, or SVC. Data in a baselayer of a bitstream may be decoded without reference to data in anyother layer of the bitstream. If the NAL unit does not relate to a baselayer in multi-view coding, 3DV, or SVC, the nuh_layer_id syntax elementmay have a non-zero value. In multi-view coding and 3DV coding,different layers of a bitstream may correspond to different views. InSVC, layers other than the base layer may be referred to as “enhancementlayers” and may provide information that enhances the visual quality ofvideo data decoded from the bitstream.

Furthermore, some pictures within a layer may be decoded withoutreference to other pictures within the same layer. Thus, NAL unitsencapsulating data of certain pictures of a layer may be removed fromthe bitstream without affecting the decodability of other pictures inthe layer. Removing NAL units encapsulating data of such pictures mayreduce the frame rate of the bitstream. A subset of pictures within alayer that may be decoded without reference to other pictures within thelayer may be referred to herein as a “sub-layer” or a “temporalsub-layer.”

NAL units may include temporal_id syntax elements. The temporal_idsyntax element of a NAL unit specifies a temporal identifier of the NALunit. The temporal identifier of a NAL unit identifies a sub-layer withwhich the NAL unit is associated. Thus, each sub-layer of a bitstreammay be associated with a different temporal identifier. If the temporalidentifier of a first NAL unit is less than the temporal identifier of asecond NAL unit, the data encapsulated by the first NAL unit may bedecoded without reference to the data encapsulated by the second NALunit.

A bitstream may be associated with a plurality of operation points. Eachoperation point of a bitstream is associated with a set of layeridentifiers (i.e., a set of nuh_reserved_zero_(—)6bits values) and atemporal identifier. The set of layer identifiers may be denoted asOpLayerIdSet and the temporal identifier may be denoted as TemporalID.If a NAL unit's layer identifier is in an operation point's set of layeridentifiers and the NAL unit's temporal identifier is less than or equalto the operation point's temporal identifier, the NAL unit is associatedwith the operation point. An operation point representation is abitstream subset that is associated with an operation point. Theoperation point representation may include each NAL unit that isassociated with the operation point. In some examples, the operationpoint representation does not include VCL NAL units that are notassociated with the operation point.

A media aware network element (MANE) 17 may apply bitstream thinning toan HEVC bitstream that is encoded with multiple sub-layers. MANE 17 maycomprise various types of computing devices, each of which may compriseone or more processors and data storage media. At any point in thebitstream, MANE 17 can start removing NAL units of higher sub-layers(i.e., sub-layers associated with higher temporal identifiers) based onthe fact that the pictures in the lower sub-layers (i.e., sub-layersassociated with lower temporal identifiers) are still decodable becausethe decoding process for the pictures in the lower sub-layers does notdepend on the NAL units of the higher sub-layers. The action of removingall NAL units with temporal identifiers higher than a certain value canbe referred to as temporal down-switching. Temporal down-switching mayalways be possible.

The term “temporal up-switching” may refer to the action of starting toforward NAL units of a certain sub-layer that has not been forwarded upuntil that point. Temporal up-switching may only be possible if none ofthe pictures in the layer that is switched to depend on any picture inthe same sub-layer prior to the point in the bitstream at which theswitch was performed. Thus, the term “temporal sub-layer switchingpoint” may refer to a picture that has no dependency on any otherpicture that is in the same sub-layer as the picture and that precedesthe picture in decoding order.

HEVC and other video coding standards specify profiles, tiers, andlevels. Profiles, tiers, and levels specify restrictions on bitstreamsand hence limits on the capabilities needed to decode the bitstreams.Profiles, tiers, and levels may also be used to indicateinteroperability points between individual decoder implementations. Eachprofile specifies a subset of algorithmic features and tools present ina video coding standard. Video encoders are not required to make use ofall features supported in a profile. Each level of a tier may specify aset of limits on the values that syntax elements and variables may have.The same set of tier and level definitions may be used with allprofiles, but individual implementations may support a different tierand within a tier a different level for each supported profile. For anygiven profile, a level of a tier may generally correspond to aparticular decoder processing load and memory capability. Capabilitiesof video decoders may be specified in terms of the ability to decodevideo streams conforming to the constraints of particular profiles,tiers, and levels. For each such profile, the tier and level supportedfor that profile may also be expressed. Some video decoders may not beable to decode particular profiles, tiers, or levels.

In HEVC, profiles, tiers, and levels may be signaled by the syntaxstructure profile_tier_level( ) syntax structure. Theprofile_tier_level( ) syntax structure may be included in a VPS and/or aSPS. The profile_tier_level( ) syntax structure may include ageneral_profile_idc syntax element, a general_tier_flag syntax element,and a general_level_idc syntax element. The general_profile_idc syntaxelement may indicate a profile to which a coded video sequence (CVS)conforms. The general_tier_flag syntax element may indicate a tiercontext for interpretation of the general_level_idc syntax element. Thegeneral_level_idc syntax element may indicate a level to which a CVSconforms. Other values for these syntax elements may be reserved.

Capabilities of video decoders may be specified in terms of the abilityto decode video streams conforming to the constraints of profiles,tiers, and levels. For each such profile, the tier and level supportedfor that profile may also be expressed. In some examples, video decodersdo not infer that a reserved value of the general_profile_idc syntaxelement between the values specified in HEVC indicates intermediatecapabilities between the specified profiles. However, video decoders mayinfer that a reserved value of the general_level_idc syntax elementassociated with a particular value of the general_tier_flag syntaxelement between the values specified in HEVC indicates intermediatecapabilities between the specified levels of the tier.

The MPEG-2 Systems specification describes how compressed multimedia(video and audio) data streams may be multiplexed together with otherdata to form a single data stream suitable for digital transmission orstorage. A specification of MPEG-2 TS is the ITU-T recommendationH.222.0, 2012 June version (hereinafter, “H.222.0”), wherein the supportof AVC and AVC extensions are provided. The amendment of MPEG-2 TS forHEVC has been developed. The latest document is “Text of ISO/IEC13818-1: 2013/Final Draft Amendment 3—Transport of HEVC video overMPEG-2 Systems,” in MPEG document w13656, July 2013 (hereinafter, “FDAM3”). Recently, an amendment of MPEG-2 TS for carriage of layered HEVChas been started. The latest document is “Text of ISO/IEC13818-1:2013/Study of PDAM 7—Carriage of Layered HEVC,” in MPEG documentw14562, July 2014 (hereinafter, “Study of PDAM 7”).

In the MPEG-2 Systems specification, an elementary stream is a single,digitally coded (possibly MPEG-compressed) component of a program. Forexample, the coded video or audio part of the program can be anelementary stream. An elementary stream is firstly converted into apacketized elementary stream (PES) before multiplexed into a programstream or transport stream. Within the same program, a stream_id is usedto distinguish the PES-packets belonging to one elementary stream fromanother.

In the MPEG-2 Systems specification, program streams and transportstreams are two alternative multiplexes targeting differentapplications. Program streams are biased for the storage and display ofa single program from a digital storage service. Program streams areprimarily intended for use in error-free environments because programstreams may be susceptible to errors.

A program stream comprises the elementary streams belonging to theprogram stream and typically contains variable length packets. In aprogram stream, PES-packets that are derived from the contributingelementary streams are organized into ‘packs.’ A pack comprises apack-header, an optional system header and any number of PES-packetstaken from any of the contributing elementary streams, in any order. Thesystem header contains a summary of the characteristics of the programstream such as: its maximum data rate; the number of contributing videoand audio elementary streams; further timing information. A decoder mayuse the information contained in a system header to determine whetherthe decoder is capable of decoding the program stream or not.

Transport streams are primarily intended for the simultaneous deliveryof a number of programs over potentially error-prone channels. Atransport stream is a multiplex devised for multi-program applicationssuch as broadcasting, so that a single transport stream can accommodatemany independent programs. A transport stream comprises a succession oftransport packets. In some instances, each of the transport packets is188 bytes long. The use of short, fixed length packets means thattransport streams are not as susceptible to errors as program streams.Further, each 188-byte-long transport packet may be given additionalerror protection by processing the transport packet through a standarderror protection process, such as Reed-Solomon encoding. The improvederror resilience of the transport stream means a transport packet has abetter chance of surviving the error-prone channels, such as channelsfound in a broadcast environment. It might seem that the transportstream is clearly the better of the two multiplexes with its increasederror resilience and ability to carry many simultaneous programs.However, the transport stream is a more sophisticated multiplex than theprogram stream and is consequently more difficult to create and todemultiplex.

The first byte of a transport packet is a synchronization byte which, insome instances, is 0x47. A single transport stream may carry manydifferent programs, each comprising many packetized elementary streams.A Packet Identifier (PID) field is used to distinguish transport packetscontaining the data of one elementary stream from those carrying thedata of other elementary streams. In some instances, the PID is 13 bits.It is the responsibility of the multiplexer to ensure that eachelementary stream is awarded a unique PID value. The last byte of atransport packet is a continuity count field. The continuity count fieldis incremented between successive transport packets belonging to thesame elementary stream. This may enable a decoder to detect the loss orgain of a transport packet and potentially conceal errors that mightotherwise result from such an event.

Although it is clear based on a PID value which elementary stream atransport packet belongs to, there is a need for the decoder to knowwhich elementary streams belong to which program. Accordingly, atransport stream comprises program specific information (PSI) toexplicitly specify relationships between the programs and the componentelementary streams.

The program specific information may include one or more program maptables (PMTS). Each program carried in a transport stream has anassociated Program Map Table. The PMT gives details about the programand the elementary streams that comprise the program. For example, thereis a program with number 3 and the program contains video with PID 33,English audio with PID 57, and Chinese audio with PID 60. A PMT mayinclude details regarding more than one program.

The basic program map table may include some of the many descriptorsspecified within the MPEG-2 systems specification. Such descriptorsconvey further information about a program or its component elementarystreams. The descriptors may include video encoding parameters, audioencoding parameters, language identification, pan-and-scan information,conditional access details, copyright information and so on. Abroadcaster or other user may define additional private descriptors ifrequired. As discussed in detail elsewhere in this disclosure, in thevideo related component elementary streams, there may also be ahierarchy descriptor, which provides information to identify the programelements containing components of hierarchically-coded video, audio, andprivate streams.

In addition to the PMT, the PSI may include a Program Stream Map (PSM).The PSM provides a description of elementary streams in a program streamand the elementary streams' relationships to one another. When carriedin a transport stream, the program stream map is not be modified. ThePSM is present as a PES packet when the stream_id value is 0xBC.

Furthermore, the PSI may include a program association table (PAT). Theprogram association table includes a complete list of all the programsavailable in a transport stream. In some examples, the PAT always hasthe PID value 0. Each program is listed along with the PID value of thetransport packets that contain the program map table of the program. ThePSI may also include a network information table (NIT) and a conditionalaccess table (CAT). The program number zero, specified in the PAT,points to the NIT. The NIT is optional and when present, providesinformation about a physical network carrying the transport stream, suchas channel frequencies, satellite transponder details, modulationcharacteristics, service originator, service name and details ofalternative networks available. If any elementary streams within atransport stream are scrambled, a CAT must be present. The CAT providesdetails of the scrambling system(s) in use and provides the PID valuesof transport packets that contain the conditional access management andentitlement information. The format of this information is not specifiedwithin the MPEG-2 Systems specification.

In the MPEG-2 transport stream, a hierarchy descriptor is designed tosignal the hierarchy of the sub-bitstreams in different elementarystreams. The hierarchy descriptor provides information to identify theprogram elements containing components of hierarchically-coded video,audio, and private streams. (See Table 2-49 below)

TABLE 2-49 Hierarchy descriptor No. of Syntax bits Mnemonichierarchy_descriptor( ) {    descriptor_tag 8 uimsbf   descriptor_length 8 uimsbf    Reserved 1 bslbf   temporal_scalability_flag 1 bslbf    spatial_scalability_flag 1 bslbf   quality_scalability_flag 1 bslbf    hierarchy_type 4 uimsbf   Reserved 2 bslbf    hierarchy_layer_index 6 uimsbf   tref_present_flag 1 bslbf    reserved 1 bslbf   hierarchy_embedded_layer_index 6 uimsbf    reserved 2 bslbf   hierarchy_channel 6 uimsbf }

In Table 2-49, temporal_scalability_flag is a 1-bit flag, which when setto ‘0’ indicates that the associated program element enhances the framerate of the bit-stream resulting from the program element referenced bythe hierarchy_embedded_layer_index. The value of ‘1’ for this flag isreserved.

spatial_scalability_flag is a 1-bit flag, which when set to ‘0’indicates that the associated program element enhances the spatialresolution of the bit-stream resulting from the program elementreferenced by the hierarchy_embedded_layer_index. The value of ‘1’ forthis flag is reserved.

quality_scalability_flag is a 1-bit flag, which when set to ‘0’indicates that the associated program element enhances thesignal-to-noise ratio (SNR) quality or fidelity of the bit-streamresulting from the program element referenced by thehierarchy_embedded_layer_index. The value of ‘1’ for this flag isreserved.

hierarchy_type indicates a hierarchy type. The hierarchical relationbetween the associated hierarchy layer and its hierarchy embedded layeris defined in Table 2-50, which is presented below. If scalabilityapplies in more than one dimension, hierarchy_type shall be set to thevalue of ‘8’ (“Combined Scalability”), and the flagstemporal_scalability_flag, spatial_scalability_flag andquality_scalability_flag shall be set accordingly. For MVC videosub-bitstreams, hierarchy_type shall be set to the value of ‘9’ (“MVCvideo sub-bitstream”) and the flags temporal_scalability_flag,spatial_scalability_flag and quality_scalability_flag shall be set to‘1’. For MVC base view sub-bitstreams, hierarchy_type shall be set tothe value of ‘15’ and the flags temporal_scalability_flag,spatial_scalability_flag and quality_scalability_flag shall be set to‘1’.

hierarchy_layer_index is a 6-bit field that defines a unique index ofthe associated program element in a table of coding layer hierarchies.Indices shall be unique within a single program definition. For videosub-bitstreams of AVC video streams conforming to one or more profilesdefined in Annex G of Rec. ITU-T H.264|ISO/IEC 14496-10, this is theprogram element index, which is assigned in a way that the bitstreamorder will be correct if associated SVC dependency representations ofthe video sub-bitstreams of the same access unit are re-assembled inincreasing order of hierarchy_layer_index. For MVC video sub-bitstreamsof AVC video streams conforming to one or more profiles defined in AnnexH of Rec. ITU-T H.264|ISO/IEC 14496-10, this is the program elementindex, which is assigned in a way that the bitstream order will becorrect if associated MVC view-component subsets of the MVC videosub-bitstreams of the same access unit are re-assembled in increasingorder of hierarchy_layer_index.

tref_present_flag is a 1-bit flag, which when set to ‘0’ indicates theTREF field may be present in the PES packet headers in the associatedelementary stream. The value of ‘1’ for this flag is reserved.

hierarchy_embedded_layer_index is a 6-bit field defining thehierarchy_layer_index of the program element that needs to be accessedand be present in decoding order before decoding of the elementarystream associated with this hierarchy_descriptor.hierarchy_embedded_layer_index is undefined if the hierarchy_type valueis 15.

hierarchy_channel is a 6-bit field that indicates the intended channelnumber for the associated program element in an ordered set oftransmission channels. The most robust transmission channel is definedby the lowest value of this field with respect to the overalltransmission hierarchy definition. A given hierarchy_channel may at thesame time be assigned to several program elements.

TABLE 2-50 Hierarchy_type field values Value Description 0 Reserved 1Spatial Scalability 2 SNR Scalability 3 Temporal Scalability 4 Datapartitioning 5 Extension bitstream 6 Private Stream 7 Multi-view Profile8 Combined Scalability 9 MVC video sub-bitstream 10-14 Reserved 15  Baselayer or MVC base view sub-bitstream or AVC video sub-bitstream of MVC

In MPEG-2 TS, a hierarchy extension descriptor may provide additionalinformation regarding a corresponding program component, and hence arespective temporal sub-layer. For instance, when a hierarchy extensiondescriptor is present, the hierarchy extension descriptor is used tospecify the dependency of layers present in different elementarystreams. Unlike a hierarchy descriptor, a hierarchy extension descriptormay indicate which layers are required to be decoded to successfullydecode the temporal sub-layer corresponding to the hierarchy extensiondescriptor. Table 7-3, below, indicates a syntax of a hierarchyextension descriptor, as specified in FDAM 3.

TABLE 7-3 No. of Mne- Syntax bits monic hierarchy_extension_descriptor() {    descriptor_tag 8 uimsbf    descriptor_length 8 uimsbf   extension_dimension_bits 16 bslbf    hierarchy_layer_index 6 uimsbf   temporal_id 3 uimsbf    nuh_layer_id 6 uimsbf    tref_present_flag 1bslbf    num_embedded_layers 6 uimsbf    hierarchy_channel 6 uimsbf   Reserved 4 bslbf    for( i = 0 ; i < num_embedded_layers ; i++ ) {      hierarchy_ext_embedded_layer_index 6 uimsbf       Reserved 2 bslbf   } }

In Table 7-3, above, extension_dimension_bits is a 16-bit fieldindicating the possible enhancement of the associated program elementfrom the base layer resulting from the program element of the layer withnuh_layer_id equal to 0. The allocation of the bits to enhancementdimensions is as follows.

TABLE 7-4 Semantics of extension_dimension_bits Index to bitsDescription 0 Multi-view enhancement 1 Spatial scalability, includingSNR 2 depth enhancement 3 AVC base layer 4 MPEG-2 base layer 3~15ReservedThe i-th bit of extension_dimension_bits being equal to 1 indicates thecorresponding enhancement dimension is present.

hierarchy_layer_index is a 6-bit field that defines a unique index ofthe associated program element in a table of coding layer hierarchies.Indices shall be unique within a single program definition. For videosub-bitstreams of HEVC video streams conforming to one or more profilesdefined in Annex G or H of Rec. ITU-T H.265|ISO/IEC 23008-2, this is theprogram element index, which is assigned in a way that the bitstreamorder will be correct if associated dependency layers of the videosub-bitstreams of the same access unit are re-assembled in increasingorder of hierarchy_layer_index.

tref_present_flag is 1-bit flag, which when set to ‘0’ indicates theTREF field may be present in the PES packet headers in the associatedelementary stream. The value of ‘1’ for this flag is reserved.

nuh_layer_id is a 6-bit field specifying the highest nuh_layer_id of theNAL units in the elementary stream associated with thishierarchy_extension_descriptor( ).

temporal_id is a 3-bit field specifying the highest TemporalId of theNAL units in the elementary stream associated with thishierarchy_extension_descriptor( ).

num_embedded_layers is a 6-bit field specifying the number of directdependent program elements that needs to be accessed and be present indecoding order before decoding of the elementary stream associated withthis hierarchy_extension_descriptor( ).

hierarchy_ext_embedded_layer_index is a 6-bit field defining thehierarchy_layer_index of the program element that needs to be accessedand be present in decoding order before decoding of the elementarystream associated with this hierarchy_extension_descriptor. This fieldis undefined if the hierarchy_type value is 15.

hierarchy_channel is a 6-bit field indicating the intended channelnumber for the associated program element in an ordered set oftransmission channels. The most robust transmission channel is definedby the lowest value of this field with respect to the overalltransmission hierarchy definition. A given hierarchy_channel may at thesame time be assigned to several program elements. In other examples,the syntax elements of the hierarchy descriptor may have differentsemantics.

In the Study of PDAM 7, a hierarchy extension descriptor (i.e., ahierarchy_extension_descriptor) cannot describe temporal dependency.Rather, hierarchy extension descriptors were designed to be usedtogether with a hierarchy descriptor (e.g., a hierarchy descriptor) insuch a way the hierarchy descriptor is used to describe only temporaldependency, whereas other types of dependency are described usinghierarchy extension descriptors. This design of hierarchy extensiondescriptors and hierarchy descriptors creates a dependency such that ahierarchy extension descriptor cannot be used without the existence of ahierarchy descriptor.

Particular techniques of this disclosure may address this dependency.For example, an indication may be included in a hierarchy extensiondescriptor to indicate temporal scalability. For instance, the hierarchyextension descriptor may include an indication of whether a programelement enhances a frame rate of a bitstream. In some examples, theindication of temporal scalability can be part of anextension_dimension_bits syntax element of a hierarchy extensiondescriptor. This may be done by updating the semantics of theextension_dimension_bits syntax element to support description oftemporal scalability as follows: when all 16 bits of theextension_dimension_bits syntax element are equal to 0, it indicatestemporal enhancement. For instance, all bits of the syntax element(e.g., extension_dimension_bits) being equal to a particular value(e.g., 0) indicates a program element enhances the frame rate of abitstream.

A hierarchy extension descriptor may indicate temporal scalability inother ways. For example, one of the reserved bits of theextension_dimension_bits syntax element may be used to indicate thetemporal scalability. For instance, a single bit of a syntax element(e.g., extension_dimension_bits) may indicate whether a program elementenhances the frame rate of a bitstream. In an example, one bit is addedto the extension_dimension_bits syntax element such that theextension_dimension_bits syntax element now has 17 bits. In thisexample, the additional bit indicates temporal scalability. Forinstance, a syntax element (e.g., extension_dimension_bits) may consistof 17 bits and a last bit of the syntax element may indicate whether aprogram element enhances the frame rate of a bitstream.

Hence, in accordance with an example of this disclosure, a videoprocessing device, such as MANE 17 or source device 12, may determinewhether a current program element enhances a frame rate of a bitstream.The current program element may include encoded video data. A videoprocessing device may be or comprise a device configured to processvideo data, such as a video encoding device, a video decoding device, anintermediate video device such as a MANE, a video streaming device, acomputing device generating files containing encoded video data, oranother type of device. In this example, the video processing device mayinclude, in a descriptor (e.g., a hierarchy extension descriptor)corresponding to the current program element, syntax elements (e.g.,hierarchy_ext_embedded_layer_index syntax elements) indicating layerindices of program elements that need to be accessed and be present indecoding order before decoding the current program element. In thisexample, the video processing device may include, in the descriptorcorresponding to the current program element, an indication of whetherthe current program element enhances the frame rate of the bitstream. Insome examples, each of the program elements corresponds to a respectivetemporal sub-layer.

In a corresponding example, a video processing device, such as MANE 17or destination device 14, may determine, based on syntax elements (e.g.,hierarchy_ext_embedded_layer_index syntax elements) in a descriptor(e.g., a hierarchy extension descriptor) corresponding to a currentprogram element, program elements that need to be accessed and bepresent in decoding order before decoding the current program element.In this example, the video processing device may determine, based on anindication in the descriptor corresponding to the current programelement, whether the current program element enhances the frame rate ofa bitstream. In this example, the bitstream may result from a set of oneor more program elements that need to be accessed and be present indecoding order before decoding the current program element.

In the Study of PDAM 7, both HEVC operation point descriptors (e.g., anhevc_operation_point_descriptor) and HEVC extension descriptors (e.g.,an hevc_extension_descriptor) provide means to signal operation pointinformation. Such operation point information includes the signaling ofprofile, tier and level (PTL) information. However, the signaling of PTLinformation for operation points in HEVC operation point descriptors andHEVC extension descriptors is not aligned with the signaling of PTLinformation in the codec level, i.e., in SHVC and MV-HEVC standards. Inthe codec level, each layer that is included in an operation point isassigned its own PTL information.

Additional techniques of this disclosure may address this problem. Forinstance, in accordance with a technique of this disclosure, operationpoints and PTL information are signaled as follows. A list of PTLinformation sets, each including PTL information, is signaled in adescriptor for a program. A list of operation points that are availablefor a program is also signaled in a descriptor. In some examples, thedescriptor including the list of operation points is a differentdescriptor from the descriptor containing the list of PTL informationsets. In other examples, the descriptor including the list of operationpoints is the same descriptor as the descriptor containing the list ofPTL information sets. Each layer included in an operation point as alayer to be decoded (i.e., as included in the sub-bitstream used todecode the operation point) is given an index referring to a set of PTLinformation. In other examples, each layer included in an operationpoint as an output layer is given an index referring to a set of PTLinformation.

Thus, in accordance with an example of this disclosure, a videoprocessing device, such as source device 12, MANE 17, or another device,may signal, in a descriptor for a program comprising one or moreelementary streams, a plurality of PTL syntax element sets. The one ormore elementary streams may comprise encoded video data. For eachrespective layer of each respective operation point of a plurality ofoperation points of the program, the video processing device may assignrespective PTL information to the respective layer of the respectiveoperation point. Additionally, the video processing device may signal,in the descriptor for the program or another descriptor for the program,a plurality of operation point syntax element sets. In this example,each respective operation point syntax element set of the plurality ofoperation point syntax element sets specifies a respective operationpoint of the plurality of operation points. Furthermore, in thisexample, for each respective layer of the respective operation point,the respective operation point syntax element includes a respectivesyntax element identifying a respective PTL syntax element set of theplurality of PTL syntax element sets specifying the respective PTLinformation assigned to the respective layer of the respective operationpoint.

In a corresponding example, a video processing device, such as MANE 17,destination device 14, or another device, may obtain, from a descriptorfor a program may comprise one or more elementary streams, a pluralityof PTL syntax element sets. In this example, each respective PTL syntaxelement set of the plurality of PTL syntax element sets comprisingsyntax elements specifying respective PTL information. Furthermore, inthis example, the video processing device may obtain, from thedescriptor for the program, a plurality of operation point syntaxelement sets. In this example, each respective operation point syntaxelement set of the plurality of operation point syntax element setsspecifies a respective operation point of a plurality of operationpoints. Additionally, in this example, for each respective operationpoint syntax element set of the plurality of operation point syntaxelement sets, the video processing device may determine, for eachrespective layer of the respective operation point specified by therespective operation point syntax element set, based on a respectivesyntax element in the respective operation point syntax element set,which of the PTL syntax element sets specifies the PTL informationassigned to the respective layer.

Aggregation of elementary streams of an operation point described insub-clause 2.17.4 of the Study of PDAM 7 can be summarized as follows.If an operation point is signaled, either inhevc_operation_point_descriptor or hevc_extension_descriptor, an HEVClayer list for an operation point is established based on the elementarystreams or layer list described for the operation point in thedescriptor. Otherwise, if neither hevc_operation_point_descriptor norhevc_extension_descriptor is present, each elementary stream isconsidered as an operation point and HEVC layer list is establishedbased on either hierarchy_descriptor or hierarchy_extension_descriptor.Otherwise, a default list of operation points is described in Table Amd7-5 of the Study of PDAM 7. Table Amd. 7-5 is reproduced below.

TABLE Amd7-5 Default HEVC layer list if no hierarchy descriptors areused Existing stream types OP₁ OP₂ OP₃ OP₄ 0x24 0x24 0x24, 0x25 0x240x24, 0x25 0x24, 0x27 0x24 0x24, 0x27 0x24, 0x25, 0x27 0x24 0x24, 0x24,0x25, 0x27 0x25 0x24, 0x25, 0x27, 0x24 0x24, 0x24, 0x27 (cf. note 0x24,0x25, 0x27, 0x28 0x25 below) 0x28 0x24, 0x29 0x24 0x24, 0x29 0x24, 0x29,0x2A 0x24 0x24, 0x24, 0x29, 0x2A 0x29 0x24, 0x25, 0x29 0x24 0x24, 0x24,0x25, 0x29 0x25 0x24, 0x25, 0x29, 0x24 0x24, 0x24, 0x25, 0x29 0x24,0x25, 0x29, 0x2A 0x25 0x2A

The above method for aggregation of elementary streams may have at leastthe following problems. In a first problem with the above method foraggregation of elementary streams, when no descriptor for operationpoint is present, it is assumed that each elementary stream is anoperation point. This may have a backward compatibility problem toHattori et al., “Text of ISO/IEC 13818-1:2013/FDAM 5—Transport of MVCdepth video sub-bitstream and support for HEVC low delay coding mode,”ISO/IEC JTC1/SC29/WG11, MPEG2014/N14315, April 2014, Valencia, ES(hereinafter, “Amendment 5 of ISO/IEC 13818-1:2013”). In the Amendment 5of ISO/IEC 13818-1:2013, an HEVC temporal video sub-bitstream togetherwith all of its associated HEVC temporal video subset is considered asone operation point. In other words, an elementary stream that onlyenhances the temporal aspect of its reference elementary stream is notconsidered to be another operation point. Therefore, when no descriptorfor operation point is present, only an elementary stream with streamtype 0x24, 0x27 and 0x29 should be considered as an operation point byitself, whereas elementary streams with stream type 0x25, 0x28 and 0x2Ashould be considered as part of operation point that is associated withthe elementary stream with type 0x24, 0x27 and 0x29 that thoseelementary streams enhance. In the Study of PDAM7, type 0x24 indicatesHEVC video stream or an HEVC temporal video sub-bitstream or HEVC basesub-partition. Furthermore, in the Study of PDAM7, type 0x27 indicatesan HEVC enhancement sub-partition which includes TemporalId 0 of an HEVCvideo stream conforming to one or more profiles defined in Annex G ofITU-T Rec. H.265|ISO/IEC 23008-2. Furthermore, in the Study of PDAM7,type 0x28 indicates an HEVC temporal enhancement sub-partition of anHEVC video stream conforming to one or more profiles defined in Annex Gof ITU-T Rec. H.265|ISO/IEC 23008-2. In the Study of PDAM8, type 0x29indicates an HEVC enhancement sub-partition which includes TemporalId 0of an HEVC video stream conforming to one or more profiles defined inAnnex H of ITU-T Rec. H.265|ISO/IEC 23008-2. In the Study of PDAM 7,type 0x2A indicates a HEVC temporal enhancement sub-partition of an HEVCvideo stream conforming to one or more profiles defined in Annex H ofITU-T Rec. H.265|ISO/IEC 23008-2.

In a second problem with the above method for aggregation of elementarystreams, hevc_operation_point_descriptor and hevc_extension_descriptorare proposed to be replaced by new hevc_extension_descriptor.Consequently, the descriptor for aggregation of elementary stream of anoperation point must also be updated. As defined in the Study of PDAM 7,HEVC layer component aggregation may be the concatenation of all HEVClayer components with the same output time from all HEVC sub-partitionsindicated in an HEVC layer list in the order indicated by the HEVC layerlist, resulting in a valid access unit as defined in Annex F of Rec.ITU-T H.265|ISO/IEC 23008-2

In accordance with a technique of this disclosure, the aggregation ofelementary stream may be modified as follows. If the descriptor thatcarries operation point information is present for a program, an HEVClayer list for each operation point described in the descriptor isestablished based on information for the operation point and shallcontains layers that are included for the operation point. Otherwise, ifthe descriptor that carries operation point information is not presentfor a program, each elementary stream ES_(i) with stream type 0x24, 0x27and 0x29 corresponds to a single target operation point OP_(i). Theaggregation of layers included in the ES_(i) and elementary streamspointed to by the syntax element hierarchy_ext_embedded_layer_index ofthe hierarchy_extension_descriptor for the ES_(i) if present, orderedaccording to the increasing order of LayerId, result in the HEVC layerlist. If the elementary stream signaled byhierarchy_ext_embedded_layer_index has further dependencies, thesedependencies shall be prepended in a recursive manner.

This disclosure describes improvements for the design MPEG-2 TransportStream (TS) for carriage of HEVC extensions. A summary of the techniquesof this disclosure is given herein, with a detailed implementation ofsome techniques provided in later sections. Some of these techniques maybe applied independently and some of them may be applied in combination.

FIG. 2 is a block diagram illustrating an example video encoder 20. FIG.2 is provided for purposes of explanation and should not be consideredlimiting of the techniques as broadly exemplified and described in thisdisclosure. For purposes of explanation, this disclosure describes videoencoder 20 in the context of HEVC coding. However, the techniques ofthis disclosure may be applicable to other coding standards or methods.

In the example of FIG. 2, video encoder 20 includes a predictionprocessing unit 100, a residual generation unit 102, a transformprocessing unit 104, a quantization unit 106, an inverse quantizationunit 108, an inverse transform processing unit 110, a reconstructionunit 112, a filter unit 114, a decoded picture buffer 116, and anentropy encoding unit 118. Prediction processing unit 100 includes aninter-prediction processing unit 120 and an intra-prediction processingunit 126. Inter-prediction processing unit 120 includes a motionestimation unit 122 and a motion compensation unit 124. In otherexamples, video encoder 20 may include more, fewer, or differentfunctional components.

In some examples, video encoder 20 may further include video data memory121. Video data memory 121 may store video data to be encoded by thecomponents of video encoder 20. The video data stored in video datamemory 121 may be obtained, for example, from video source 18. Decodedpicture buffer 116 may be a reference picture memory that storesreference video data for use in encoding video data by video encoder 20,e.g., in intra- or inter-coding modes. Video data memory 121 and decodedpicture buffer 116 may be formed by any of a variety of memory devices,such as dynamic random access memory (DRAM), including synchronous DRAM(SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or othertypes of memory devices. Video data memory 121 and decoded picturebuffer 116 may be provided by the same memory device or separate memorydevices. In various examples, video data memory 121 may be on-chip withother components of video encoder 20, or off-chip relative to thosecomponents.

Video encoder 20 may receive video data. Video encoder 20 may encodeeach CTU in a slice of a picture of the video data. Video encoder 20 mayencode CUs of a CTU to generate encoded representations of the CUs(i.e., coded CUs). As part of encoding a CU, prediction processing unit100 may partition the coding blocks associated with the CU among one ormore PUs of the CU. Thus, each PU may be associated with a lumaprediction block and corresponding chroma prediction blocks. Videoencoder 20 and video decoder 30 may support PUs having various sizes.The size of a CU may refer to the size of the luma coding block of theCU and the size of a PU may refer to the size of a luma prediction blockof the PU. Assuming that the size of a particular CU is 2N×2N, videoencoder 20 and video decoder 30 may support PU sizes of 2N×2N or N×N forintra prediction, and symmetric PU sizes of 2N×2N, 2N×N, N×2N, N×N, orsimilar for inter prediction. Video encoder 20 and video decoder 30 mayalso support asymmetric partitioning for PU sizes of 2N×nU, 2N×nD,nL×2N, and nR×2N for inter prediction.

Inter-prediction processing unit 120 may generate predictive data for aPU by performing inter prediction on each PU of a CU. The predictivedata for the PU may include predictive blocks of the PU and motioninformation for the PU. Inter-prediction processing unit 120 may performdifferent operations for a PU of a CU depending on whether the PU is inan I slice, a P slice, or a B slice. In an I slice, all PUs are intrapredicted. Hence, if the PU is in an I slice, inter-predictionprocessing unit 120 does not perform inter prediction on the PU.

If a PU is in a P slice, motion estimation unit 122 may search thereference pictures in a list of reference pictures (e.g., “RefPicList0”)for a reference region for the PU. The reference region for the PU maybe a region, within a reference picture, that contains samples that mostclosely correspond to the prediction blocks of the PU. Motion estimationunit 122 may generate a reference index indicating a position inRefPicList0 of the reference picture containing the reference region forthe PU. In addition, motion estimation unit 122 may generate a motionvector indicating a spatial displacement between a coding block of thePU and a reference location associated with the reference region. Forinstance, the motion vector may be a two-dimensional vector thatprovides an offset from the coordinates in the current picture tocoordinates in a reference picture. Motion estimation unit 122 mayoutput the reference index and the motion vector as the motioninformation of the PU. Motion compensation unit 124 may generate thepredictive blocks of the PU based on actual or interpolated samples atthe reference location indicated by the motion vector of the PU.

If a PU is in a B slice, motion estimation unit 122 may performuni-prediction or bi-prediction for the PU. To perform uni-predictionfor the PU, motion estimation unit 122 may search the reference picturesof RefPicList0 or a second reference picture list (“RefPicList1”) for areference region for the PU. Motion estimation unit 122 may output, asthe motion information of the PU, a reference index indicating aposition in RefPicList0 or RefPicList1 of the reference picture thatcontains the reference region, a motion vector indicating a spatialdisplacement between a prediction block of the PU and a referencelocation associated with the reference region, and one or moreprediction direction indicators indicating whether the reference pictureis in RefPicList0 or RefPicList1. Motion compensation unit 124 maygenerate the predictive blocks of the PU based at least in part onactual or interpolated samples at the reference location indicated bythe motion vector of the PU.

To perform bi-directional inter prediction for a PU, motion estimationunit 122 may search the reference pictures in RefPicList0 for areference region for the PU and may also search the reference picturesin RefPicList1 for another reference region for the PU. Motionestimation unit 122 may generate reference indexes indicating positionsin RefPicList0 and RefPicList1 of the reference pictures containing thereference regions. In addition, motion estimation unit 122 may generatemotion vectors indicating spatial displacements between the referencelocations associated with the reference regions and a prediction blockof the PU. The motion information of the PU may include the referenceindexes and the motion vectors of the PU. Motion compensation unit 124may generate the predictive blocks of the PU based at least in part onactual or interpolated samples at the reference locations indicated bythe motion vectors of the PU.

Intra-prediction processing unit 126 may generate predictive data for aPU by performing intra prediction on the PU. The predictive data for thePU may include predictive blocks for the PU and various syntax elements.Intra-prediction processing unit 126 may perform intra prediction on PUsin I slices, P slices, and B slices.

To perform intra prediction on a PU, intra-prediction processing unit126 may use multiple intra prediction modes to generate multiple sets ofpredictive blocks for the PU. When performing intra prediction using aparticular intra prediction mode, intra-prediction processing unit 126may generate predictive blocks for the PU using a particular set ofsamples from neighboring blocks. The neighboring blocks may be above,above and to the right, above and to the left, or to the left of theprediction blocks of the PU, assuming a left-to-right, top-to-bottomencoding order for PUs, CUs, and CTUs. Intra-prediction processing unit126 may use various numbers of intra prediction modes, e.g., 33directional intra prediction modes. In some examples, the number ofintra prediction modes may depend on the size of the prediction blocksof the PU.

Prediction processing unit 100 may select the predictive data for PUs ofa CU from among the predictive data generated by inter-predictionprocessing unit 120 for the PUs or the predictive data generated byintra-prediction processing unit 126 for the PUs. In some examples,prediction processing unit 100 selects the predictive data for the PUsof the CU based on rate/distortion metrics of the sets of predictivedata. The predictive blocks of the selected predictive data may bereferred to herein as the selected predictive blocks.

Residual generation unit 102 may generate, based on the coding blocks(e.g., luma, Cb, and Cr coding blocks) of a CU and the selectedpredictive blocks (e.g., predictive luma, Cb, and Cr blocks) of the PUsof the CU, residual blocks (e.g., luma, Cb, and Cr residual blocks) ofthe CU. For instance, residual generation unit 102 may generate theresidual blocks of the CU such that each sample in the residual blockshas a value equal to a difference between a sample in a coding block ofthe CU and a corresponding sample in a corresponding selected predictiveblock of a PU of the CU.

Transform processing unit 104 may perform quad-tree partitioning topartition the residual blocks of a CU into transform blocks associatedwith TUs of the CU. Thus, a TU may be associated with a luma transformblock and two corresponding chroma transform blocks. The sizes andpositions of the luma and chroma transform blocks of TUs of a CU may ormay not be based on the sizes and positions of prediction blocks of thePUs of the CU.

Transform processing unit 104 may generate transform coefficient blocksfor each TU of a CU by applying one or more transforms to the transformblocks of the TU. Transform processing unit 104 may apply varioustransforms to a transform block associated with a TU. For example,transform processing unit 104 may apply a discrete cosine transform(DCT), a directional transform, or a conceptually-similar transform to atransform block. In some examples, transform processing unit 104 doesnot apply transforms to a transform block. In such examples, thetransform block may be treated as a transform coefficient block.

Quantization unit 106 may quantize the transform coefficients in acoefficient block. The quantization process may reduce the bit depthassociated with some or all of the transform coefficients. For example,an n-bit transform coefficient may be rounded down to an m-bit transformcoefficient during quantization, where n is greater than m. Quantizationunit 106 may quantize a coefficient block associated with a TU of a CUbased on a quantization parameter (QP) value associated with the CU.Video encoder 20 may adjust the degree of quantization applied to thecoefficient blocks associated with a CU by adjusting the QP valueassociated with the CU. Quantization may introduce loss of information,thus quantized transform coefficients may have lower precision than theoriginal ones.

Inverse quantization unit 108 and inverse transform processing unit 110may apply inverse quantization and inverse transforms to a coefficientblock, respectively, to reconstruct a residual block from thecoefficient block. Reconstruction unit 112 may add the reconstructedresidual block to corresponding samples from one or more predictiveblocks generated by prediction processing unit 100 to produce areconstructed transform block associated with a TU. By reconstructingtransform blocks for each TU of a CU in this way, video encoder 20 mayreconstruct the coding blocks of the CU.

Filter unit 114 may perform one or more deblocking operations to reduceblocking artifacts in the coding blocks associated with a CU. Decodedpicture buffer 116 may store the reconstructed coding blocks afterfilter unit 114 performs the one or more deblocking operations on thereconstructed coding blocks. Inter-prediction processing unit 120 mayuse a reference picture containing the reconstructed coding blocks toperform inter prediction on PUs of other pictures. In addition,intra-prediction processing unit 126 may use reconstructed coding blocksin decoded picture buffer 116 to perform intra prediction on other PUsin the same picture as the CU.

Entropy encoding unit 118 may receive data from other functionalcomponents of video encoder 20. For example, entropy encoding unit 118may receive coefficient blocks from quantization unit 106 and mayreceive syntax elements from prediction processing unit 100. Entropyencoding unit 118 may perform one or more entropy encoding operations onthe data to generate entropy-encoded data. For example, entropy encodingunit 118 may perform a CABAC operation, a CAVLC operation, avariable-to-variable (V2V) length coding operation, a syntax-basedcontext-adaptive binary arithmetic coding (SBAC) operation, aProbability Interval Partitioning Entropy (PIPE) coding operation, anExponential-Golomb encoding operation, or another type of entropyencoding operation on the data. Video encoder 20 may output a bitstreamincluding entropy-encoded data generated by entropy encoding unit 118.

FIG. 3 is a block diagram illustrating an example video decoder 30. FIG.3 is provided for purposes of explanation and is not limiting on thetechniques as broadly exemplified and described in this disclosure. Forpurposes of explanation, this disclosure describes video decoder 30 inthe context of HEVC coding. However, the techniques of this disclosuremay be applicable to other coding standards or methods.

In the example of FIG. 3, video decoder 30 includes an entropy decodingunit 150, a prediction processing unit 152, an inverse quantization unit154, an inverse transform processing unit 156, a reconstruction unit158, a filter unit 160, and a decoded picture buffer 162. Predictionprocessing unit 152 includes a motion compensation unit 164 and anintra-prediction processing unit 166. In other examples, video decoder30 may include more, fewer, or different functional components.

In some examples, video decoder 30 may further include video datamemory. The video data memory may store video data, such as an encodedvideo bitstream, to be decoded by the components of video decoder 30.The video data stored in the video data memory may be obtained, forexample, from channel 16, e.g., from a local video source, such as acamera, via wired or wireless network communication of video data, or byaccessing physical data storage media. The video data memory may form acoded picture buffer (CPB) 151 and a decoded picture buffer 162. CPB 151stores encoded video data from an encoded video bitstream. Decodedpicture buffer 162 may be a reference picture memory that storesreference video data for use in decoding video data by video decoder 30,e.g., in intra- or inter-coding modes. CBP 151 and decoded picturebuffer 162 may be formed by any of a variety of memory devices, such asDRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. CPB151 and decoded picture buffer 162 may be provided by the same memorydevice or separate memory devices. In various examples, the video datamemory may be on-chip with other components of video decoder 30, oroff-chip relative to those components.

CPB 151 may receive and store encoded video data (e.g., NAL units) of abitstream. Entropy decoding unit 150 may receive NAL units from CPB 151and parse the NAL units to obtain syntax elements from the bitstream.Entropy decoding unit 150 may entropy decode entropy-encoded syntaxelements in the NAL units. Prediction processing unit 152, inversequantization unit 154, inverse transform processing unit 156,reconstruction unit 158, and filter unit 160 may generate decoded videodata based on the syntax elements obtained from the bitstream.

The NAL units of the bitstream may include coded slice NAL units. Aspart of decoding the bitstream, entropy decoding unit 150 may parse andentropy decode syntax elements from the coded slice NAL units. Each ofthe coded slices may include a slice header and slice data. The sliceheader may contain syntax elements pertaining to a slice.

In addition to decoding syntax elements from the bitstream, videodecoder 30 may perform a decoding operation on a CU. By performing thedecoding operation on a CU, video decoder 30 may reconstruct codingblocks of the CU.

As part of performing a decoding operation on a CU, inverse quantizationunit 154 may inverse quantize, i.e., de-quantize, coefficient blocksassociated with TUs of the CU. Inverse quantization unit 154 may use aQP value associated with the CU of the TU to determine a degree ofquantization and, likewise, a degree of inverse quantization for inversequantization unit 154 to apply. That is, the compression ratio, i.e.,the ratio of the number of bits used to represent original sequence andthe compressed one, may be controlled by adjusting the value of the QPused when quantizing transform coefficients. The compression ratio mayalso depend on the method of entropy coding employed.

After inverse quantization unit 154 inverse quantizes a coefficientblock, inverse transform processing unit 156 may apply one or moreinverse transforms to the coefficient block in order to generate aresidual block associated with the TU. For example, inverse transformprocessing unit 156 may apply an inverse DCT, an inverse integertransform, an inverse Karhunen-Loeve transform (KLT), an inverserotational transform, an inverse directional transform, or anotherinverse transform to the coefficient block.

If a PU is encoded using intra prediction, intra-prediction processingunit 166 may perform intra prediction to generate predictive blocks forthe PU. Intra-prediction processing unit 166 may use an intra predictionmode to generate the predictive blocks (e.g., predictive luma, Cb, andCr blocks) for the PU based on the prediction blocks ofspatially-neighboring PUs. Intra-prediction processing unit 166 maydetermine the intra prediction mode for the PU based on one or moresyntax elements decoded from the bitstream.

Prediction processing unit 152 may construct a first reference picturelist (RefPicList0) and a second reference picture list (RefPicList1)based on syntax elements extracted from the bitstream. Furthermore, if aPU is encoded using inter prediction, entropy decoding unit 150 mayobtain motion information for the PU. Motion compensation unit 164 maydetermine, based on the motion information of the PU, one or morereference regions for the PU. Motion compensation unit 164 may generate,based on samples at the one or more reference blocks for the PU,predictive blocks (e.g., predictive luma, Cb, and Cr blocks) for the PU.

Reconstruction unit 158 may use the residual values from the transformblocks (e.g., luma, Cb, and Cr transform blocks) of TUs of a CU and thepredictive blocks (e.g., luma, Cb, and Cr blocks) of the PUs of the CU,i.e., either intra-prediction data or inter-prediction data, asapplicable, to reconstruct the coding blocks (e.g., luma, Cb, and Crcoding blocks) of the CU. For example, reconstruction unit 158 may addsamples of the transform blocks (e.g., luma, Cb, and Cr transformblocks) to corresponding samples of the predictive blocks (e.g.,predictive luma, Cb, and Cr blocks) to reconstruct the coding blocks(e.g., luma, Cb, and Cr coding blocks) of the CU.

Filter unit 160 may perform a deblocking operation to reduce blockingartifacts associated with the coding blocks (e.g., luma, Cb, and Crcoding blocks) of the CU. Video decoder 30 may store the coding blocks(e.g., luma, Cb, and Cr coding blocks) of the CU in decoded picturebuffer 162. Decoded picture buffer 162 may provide reference picturesfor subsequent motion compensation, intra prediction, and presentationon a display device, such as display device 32 of FIG. 1. For instance,video decoder 30 may perform, based on the blocks (e.g., luma, Cb, andCr blocks) in decoded picture buffer 162, intra prediction or interprediction operations on PUs of other CUs. In this way, video decoder 30may extract, from the bitstream, transform coefficient levels of thesignificant luma coefficient block, inverse quantize the transformcoefficient levels, apply a transform to the transform coefficientlevels to generate a transform block, generate, based at least in parton the transform block, a coding block, and output the coding block fordisplay.

This following section of this disclosure describes exampleimplementation details of particular techniques of this disclosure. Inthe following section, underlined text indicates added text.

As mentioned above, an indication may be included in a hierarchyextension descriptor to indicate temporal scalability. For instance, thehierarchy extension descriptor may include an indication of whether aprogram element enhances a frame rate of a bitstream. Table Amd. 7-3,below, is an example syntax table of hierarchy extension descriptor toaccommodate temporal scalability.

TABLE Amd. 7-3 Hierarchy extension descriptor No. of Mne- Syntax bitsmonic hierarchy_extension_descriptor( ) {    descriptor_tag 8 uimsbf   descriptor_length 8 uimsbf    extension_dimension_bits 16 bslbf   no_temporal_scalability_flag 1 bslbf    Reserved 7 bslbf   hierarchy_layer_index 6 uimsbf    temporal_id 3 uimsbf   nuh_layer_id 6 uimsbf    tref_present_flag 1 bslbf   num_embedded_layers 6 uimsbf    hierarchy_channel 6 uimsbf   Reserved 4 bslbf    for( i = 0 ; i < num_embedded_layers ; i++ ) {      hierarchy_ext_embedded_layer_index 6 uimsbf       Reserved 2 bslbf   } }

In the example of Table Amd. 7-3, no_temporal_scalability_flag is a1-bit flag, which when set to ‘0’ indicates that the associated programelement enhances the frame rate of the bitstream resulting from theprogram elements referenced by the hierarchy_embedded_layer_index. Thevalue of ‘1’ for this flag is reserved. The semantics of other syntaxelements of the hierarchy_extension_descriptor may remain the same asindicated above. Thus, in some examples, the indication of whether thecurrent program element enhances the frame rate of a bitstream mayconsist of a 1-bit flag separate from a syntax element indicatingenhancements of the current program element, relative to a base layer.

In another example to indicate temporal scalability in a hierarchyextension descriptor, the semantic of extension_dimension_bits isupdated as follows:

extension_dimension_bits—A 16-bit field indicating the possibleenhancement of the associated program element from the base layerresulting from the program element of the layer with nuh_layer_id equalto 0. When extension_dimension_bits is equal to 0, it indicates that theassociated program element enhances the frame rate of the bit-streamresulting from the program elements referenced by thehierarchy_embedded_layer_index

Thus, in some examples of this disclosure, the indication whether thecurrent program element enhances the frame rate of a bitstream may bepart of a syntax element (e.g., extension_dimension_bits) that indicatesenhancements of the current program element, relative to a base layer.Furthermore, in some such examples, all bits of the syntax element beingequal to a particular value (e.g., 0) indicates the current programelement enhances the frame rate of the bitstream.

In another example indicating temporal scalability in a hierarchyextension descriptor, one of the reserved bits of the syntax elementextension_dimension_bits is used to indicate temporal scalability. Thus,in some examples of this disclosure, a single bit of a syntax element(e.g., extension_dimension_bits) indicates whether a current programelement enhances the frame rate of a bitstream. This example may beimplemented by changing Table 7-4 is shown as below in Table Amd. 7-4:

TABLE Amd. 7-4 Semantics of extension dimension bits Index to bitsDescription 0 Multi-view enhancement 1 Spatial scalability, includingSNR 2 Depth enhancement 3 AVC base layer 4 MPEG-2 base layer 5 Temporalenhancement 6~15 Reserved

As indicated above, particular techniques of this disclosure provide forindicating, in a descriptor, PTL information for each layer of anoperation point. These techniques may be implemented in various ways.For example, the signaling of operation point and PTL information may beimplemented using an HEVC extension descriptor including the syntaxshown in the table below.

HEVC Extension Descriptor

No. of Syntax bits Mnemonic HEVC_extension_descriptor ( ) {   descriptor_tag 8 uimsbf    descriptor_length 8 uimsbf   num_profile_tier_level 8 uimsbf    for( i=0; i <num_profile_tier_level;i++ ) {       profile_space 2 uimsbf      tier_flag 1 bslbf       profile_idc 5 uimsbf      profile_compatibility_idc 32 bslbf       progressive_source_flag 1bslbf       interlaced_source_flag 1 bslbf      non_packed_constraint_flag 1 bslbf      frame_only_constraint_flag 1 bslbf       reserved_zero_44bits 44bslbf       level_idc 8 uislbf    }    num_operation_points 8 uimsbf   for( i=0; i < num_operation_points; i++ ) {       max_temporal_id 3bslbf       reserved_bits 5 bslbf       num_layers_in_operation_point 6uimsbf       reserved_bits 2 bslbf       for( j=0; j <num_layers_in_operation_point; j++ ) {          reserved_bits 1 bslbf         layer_id_included 6 bslbf          output_layer_flag 1 bslbf         ptl_index 8 bslbf       }       average_bit_rate 16 uimsbf      maximum_bit_rate 16 uimsbf       constant_frame_rate_mode 1 bslbf      frame_rate 15 uimsbf    } }

In table above, num_profile_tier_level is an 8-bit field specifying thenumber of profile, tier and level structures specified by thisdescriptor. Thus, in some examples, a video processing device maydetermine, based on a syntax element (e.g., num_profile_tier_level) inthe descriptor for the program, the number of PTL syntax element sets inthe plurality of PTL syntax element sets. Similarly, in some examples, avideo processing device may signal, in the first descriptor for theprogram, a syntax element (e.g., num_profile_tier_level) indicating thenumber of PTL syntax element sets in the plurality of PTL syntax elementsets.

profile_space is a 2-bit field specifying the context for theinterpretation of profile_idc for all values of i in the range of 0 to31, inclusive. In this example, profile_space shall not be assignedvalues other than those specified in Annex A or subclause G.11 or insubclause H.11 of Rec. ITU-T H.265|ISO/IEC 23008-2. Other values ofprofile_idc are reserved for future use by ITU-T|ISO/IEC.

tier_flag is a 1-bit field specifying the tier context for theinterpretation of level_idc as specified in Annex A or subclause G.11 orsubclause H.11 of Rec. ITU-T H.265|ISO/IEC 23008-2.

profile_idc is a 5-bit field that, when profile_space is equal to 0,indicates a profile to which the CVS resulting from HEVC layeraggregation of the HEVC sub-partition included in the specifiedoperation point and all HEVC sub-partitions on which this sub-partitiondepends conforms as specified in Annex A or of Rec. ITU-T H.265|ISO/IEC23008-2. profile_idc shall not be assigned values other than thosespecified in Annex A or G.11 or H.11 of Rec. ITU-T H.265|ISO/IEC23008-2. Other values of profile_idc are reserved for future use byITU-T|ISO/IEC.

profile_compatibility_indication, progressive_source_flag,interlaced_source_flag, non_packed_constraint_flag,frame_only_constraint_flag, reserved_zero_(—)44bits, level_idc—When theHEVC extension video descriptor applies to an HEVC enhancementsub-partition, these fields shall be coded according to the semanticsdefined in Rec. ITU-T H.265|ISO/IEC 23008-2 for general_profile_space,general_tier_flag, general_profile_idc,general_profile_compatibility_flag[i], general_progressive_source_flag,general_interlaced_source_flag, general_non_packed_constraint_flag,general_frame_only_constraint_flag, general_reserved_zero_(—)44bits,general_level_idc, respectively, for the corresponding HEVCsub-partition, and the HEVC video stream resulting from HEVC layeraggregation of the HEVC sub-partition to which the HEVC video descriptoris associated with all HEVC sub-partitions on which this sub-partitiondepends shall conform to the information signaled by these fields.

level_idc is an 8-bit field indicating a level to which the CVS conformsas specified in Annex A, G.11 or H.11 of Rec. ITU-T H.265|ISO/IEC23008-2. level_idc shall not be assigned values of level_idc other thanthose specified in Annex A, G.11 or H.11 of Rec. ITU-T H.265|ISO/IEC23008-2. Other values of level_idc are reserved for future use byITU-T|ISO/IEC.

Thus, in some examples, a video processing device may determine, basedon a respective profile syntax element (e.g., profile_idc) in therespective PTL syntax element set, a profile to which a coded videosequence conforms. Furthermore, the video processing device maydetermine, based on a respective tier syntax element (e.g., tier_flag)in the respective PTL syntax element set, a context for interpretationof a respective level indicator syntax element (e.g., level_idc) in therespective PTL syntax element set. In such examples, the videoprocessing device may determine, based on the respective level indicatorsyntax element in the respective PTL syntax element set, a level towhich the coded video sequence conforms.

Similarly, in some examples, for each respective PTL syntax element setof the plurality of PTL syntax element sets, a video processing devicemay signal, in the respective PTL syntax element set, a respectiveprofile syntax element (e.g., profile_idc) specifying a profile to whicha coded video sequence conforms. Furthermore, the video processingdevice may signal, in the respective PTL syntax element set, arespective tier syntax element (e.g., tier_flag). The respective tiersyntax element may specify a context for interpretation of a respectivelevel indicator syntax element (e.g., level_idc) in the respective PTLsyntax element set. In such examples, the video processing device maysignal, in the respective PTL syntax element set, the respective levelindicator syntax element. The respective level indicator syntax elementmay indicate a level to which a coded video sequence conforms.

num_operation_points is an 8-bit field specifying the number ofoperation points specified by this descriptor. Thus, in some examples, avideo processing device may determine, based on a syntax element (e.g.,num_operation_points) in the descriptor, the number of operation pointsyntax element sets in the plurality of operation point syntax elementsets. Similarly, in some examples, a video processing device may signala syntax element (e.g., num_operation_points) in the descriptorindicating the number of operation point syntax element sets in theplurality of operation point syntax element sets. In other examples, thesyntax element may be determined based on a syntax element in adescriptor separate from a descriptor including the PTL syntax elementssets. Likewise, in some examples, a video processing device may signalthis syntax element in a descriptor separate from a descriptor includingthe PTL syntax elements sets.

max_temporal_id is a 3-bit field specifying the highest TemporalId ofthe NAL units of the layers in the i-th operation point.

num_layers_in_operation_point is a 6-bit field specifying the number oflayers that are included in the i-th operation point. Thus, in someexamples, for each respective operation point syntax element set of aplurality of operation point syntax element sets, a video processingdevice may signal the number of layers of the respective operation pointspecified by the respective operation point syntax element set.Similarly, in some examples, for each respective operation point syntaxelement set of the plurality of operation point syntax element sets, thevideo processing device may determine, based on a syntax element (e.g.,num_layers_in_operation_point) in a descriptor, the number of layers ofthe respective operation point specified by the respective operationpoint syntax element set. In other examples, the syntax element (e.g.,num_layers_in_operation_point) may be signaled in a descriptor separatefrom the descriptor including the PTL syntax element sets.

layer_id_included is a 6-bit field specifying the nuh_layer_id of thelayer that is included in the i-th operation point.

output_layer_flag is a 1-bit field which, when assigned value ‘1’,indicates that a layer with nuh_layer_id equal to layer_id_included isan output layer when the i-th operation point is decoded. Whenoutput_layer_flag is assigned value ‘0’, the layer with nuh_layer_idequal to layer_id_included is not an output layer when the i-thoperation point is decoded.

ptl_index is an 8-bit field specifying the index of profile, tier andlevel that is assigned to the j-th layer in the i-th operation point.

average_bitrate is a 16-bit field indicating the average bit rate, in1000 bits per second, of the HEVC layered video stream corresponding tothe i-th operation point.

maximum_bitrate is a 16-bit field indicating the maximum bit rate, inkbit per second, of the HEVC layered video stream corresponding to thei-th operation point.

constant_frame_rate_mode is a 1-bit field specify how the frame_rate asspecified below is interpreted.

frame_rate is a 15-bit field indicating the maximum picture rate of theHEVC layered video stream corresponding to the i-th operation point. Ifconstant_frame_rate_mode equals 0, the frame_rate is measured in framesper second. Otherwise, if constant_frame_rate_mode equals 1, theframe_rate is measured in frames per 1.001 seconds.

Thus, in some examples, for each respective operation point syntaxelement set of a plurality of operation point syntax element sets, avideo processing device may signal a respective first syntax element(e.g., max_temporal_id) specifying a maximum temporal identifier of therespective operation point specified by the respective operation pointsyntax element set. Additionally, the video processing device may signala respective second syntax element (e.g., average_bit_rate) specifyingan average bit rate of the respective operation point specified by therespective operation point syntax element set. The video processingdevice may also signal a respective third syntax element (e.g.,maximum_bit_rate) specifying a maximum bit rate of the respectiveoperation point specified by the respective operation point syntaxelement set. The video processing device may signal a respective fourthsyntax element (e.g., frame_rate) specifying a maximum picture rate of aHigh Efficiency Video Coding (HEVC) layered video stream correspondingto the respective operation point specified by the respective operationpoint syntax element set.

Similarly, in some examples, for each respective operation point syntaxelement set of the plurality of operation point syntax element sets, avideo processing device may determine, based on a respective firstsyntax element (e.g., max⁺ temporal_id) in the respective operationpoint syntax element set, a maximum temporal identifier of therespective operation point specified by the respective operation pointsyntax element set. The video processing device may also determine,based on a respective second syntax element (e.g., average_bit_rate) inthe respective operation point syntax element set, an average bit rateof the respective operation point specified by the respective operationpoint syntax element set. Furthermore, the video processing device maydetermine, based on a respective third syntax element (e.g.,maximum_bit_rate) in the respective operation point syntax element set,a maximum bit rate of the respective operation point specified by therespective operation point syntax element set. Moreover, the videoprocessing device may determine, based on a respective fourth syntaxelement (e.g., frame_rate) in the respective operation point syntaxelement set, a maximum picture rate of a High Efficiency Video Coding(HEVC) layered video stream corresponding to the respective operationpoint specified by the respective operation point syntax element set.

As indicated above, particular techniques of this disclosure modify theaggregation of the elementary stream. In accordance with some examples,the HEVC layer list for one or more operation points is specifiedaccording as follows: If the program map table (PMT) contains anhevc_extension_descriptor, the aggregation of layers that are indicatedincluded in the operation point by the syntax element layer_id_included,ordered according to the increasing value of LayerId value, result inthe HEVC layer list. Otherwise, each elementary stream ES_(i), withstream type 0x24, 0x27 and 0x29 corresponds to a single target operationpoint OP_(i). The aggregation of layers included in the ES_(i) andelementary streams pointed to by the syntax elementhierarchy_ext_embedded_layer_index of the hierarchy_extension_descriptorfor the ESi, if present, ordered according to the increasing order ofLayerId, result in the HEVC layer list. If the ES signaled byhierarchy_ext_embedded_layer_index has further dependencies, thesedependencies shall be prepended in a recursive manner. Each elementarystream ES_(j) with stream type 0x25, 0x28 or 0x2A are considered as partof operation point associated elementary stream that it enhances.

FIG. 4A is a flowchart illustrating a first example operation to processvideo data, in accordance with a technique of this disclosure. Theflowcharts of this disclosure are examples. Other examples in accordancewith techniques of this disclosure may include more, fewer, or differentactions. Furthermore, in some examples, actions may be performed indifferent orders or in parallel.

In the example of FIG. 4A, a video processing device, such as MANE 17,source device 12, or another device, determines whether a currentprogram element enhances a frame rate of a bitstream (400). In theexample of FIG. 4A, the bitstream may result from a set of one or moreprogram elements that need to be accessed and be present in decodingorder before decoding the current program element.

Furthermore, the video processing device includes, in a descriptorcorresponding to the current program element, syntax elements indicatinglayer indices of the program elements that need to be accessed and bepresent in decoding order before decoding the current program element(402). The video processing device includes, in the descriptorcorresponding to the current program element, an indication of whetherthe current program element enhances the frame rate of the bitstream(404).

FIG. 4B is a flowchart illustrating a second example operation toprocess video data, in accordance with a technique of this disclosure.In the example of FIG. 4B, a video processing device, such as MANE 17,destination device 14, or another device, determines, based on syntaxelements in a descriptor corresponding to a current program element,program elements that need to be accessed and be present in decodingorder before decoding the current program element (450).

Furthermore, the video processing device determines, based on anindication in the descriptor corresponding to the current programelement, whether the current program element enhances the frame rate ofa bitstream (452). In the example of FIG. 4B, the bitstream results froma set of one or more program elements that need to be accessed and bepresent in decoding order before decoding the current program element.

FIG. 5A is a flowchart illustrating a third example operation to processvideo data, in accordance with a technique of this disclosure. Theoperation of FIG. 5A may be performed in conjunction with the exampleoperation of FIG. 4A. In the example of FIG. 5A, a video processingdevice, such as source device 12, MANE 17, or another device, signals,in a descriptor for a program comprising one or more elementary streams,a plurality of PTL syntax element sets (500).

Additionally, the video processing device signals, in the descriptor ora different descriptor (e.g., a first or a second descriptor) for theprogram, a plurality of operation point syntax element sets (504). Inthe example of FIG. 5A, each respective operation point syntax elementset of the plurality of operation point syntax element sets specifies arespective operation point of the plurality of operation points. Foreach respective layer of the respective operation point, the respectiveoperation point syntax element set includes a respective syntax elementidentifying a respective PTL syntax element set of the plurality of PTLsyntax element sets specifying PTL information assigned to therespective layer of the respective operation point. The respectiveoperation point may have a plurality of layers. The first and/or thesecond descriptor may be in a transport stream. In other examples, thefirst and/or second descriptors are in a program stream or elsewhere.

FIG. 5B is a flowchart illustrating a fourth example operation toprocess video data, in accordance with a technique of this disclosure.The operation of FIG. 5B may be performed in conjunction with theexample operation of FIG. 4B.

In the example of FIG. 5B, a video processing device, such as MANE 17,destination device 14, or another device, obtains, from a descriptor fora program comprising one or more elementary streams, a plurality of PTLsyntax element sets (550). Each respective PTL syntax element set of theplurality of PTL syntax element sets comprises syntax elementsspecifying respective PTL information. Additionally, the videoprocessing device obtains, from the descriptor or another descriptor(e.g., a first or a second descriptor) for the program, a plurality ofoperation point syntax element sets (552). Each respective operationpoint syntax element set of the plurality of operation point syntaxelement sets specifies a respective operation point of a plurality ofoperation points. The first and/or the second descriptor may be in atransport stream. In other examples, the first and/or second descriptorsare in a program stream or elsewhere.

For each respective operation point syntax element set of the pluralityof operation point syntax element sets, the video processing devicedetermines, for each respective layer of the respective operation pointspecified by the respective operation point syntax element set, based ona respective syntax element in the respective operation point syntaxelement set, which of the PTL syntax element sets specifies the PTLinformation assigned to the respective layer (554). The respectiveoperation point may have a plurality of layers.

The following paragraphs list a selection of examples of thisdisclosure.

Example 1

A method of processing video data, the method comprising: including, ina hierarchy extension descriptor, an indication of temporal scalability.

Example 2

The method of example 1, wherein the indication is part of an extensiondimension bits syntax element that indicates possible enhancements of anassociated program element from a base layer resulting from the programelement of the base layer.

Example 3

The method of example 2, wherein all bits of the extension dimensionbits syntax element being equal to a particular value indicates temporalenhancement.

Example 4

The method of examples 2 or 3, wherein a reserved bit of the extensiondimension bits syntax element indicates temporal scalability.

Example 5

The method of any of examples 2-4, wherein the extension dimension bitssyntax element includes an additional bit that indicates temporalscalability.

Example 6

A method of processing video data, the method comprising: obtaining,from a hierarchy extension descriptor, an indication of temporalscalability.

Example 7

The method of example 6, wherein the indication is part of an extensiondimension bits syntax element that indicates possible enhancements of anassociated program element from a base layer resulting from the programelement of the base layer.

Example 8

The method of example 7, wherein all bits of the extension dimensionbits syntax element being equal to a particular value indicates temporalenhancement.

Example 9

The method of any of examples 7 or 8, wherein a reserved bit of theextension dimension bits syntax element indicates temporal scalability.

Example 10

The method of any of examples 7-9, wherein the extension dimension bitssyntax element includes an additional bit that indicates temporalscalability.

Example 11

A method of processing video data, the method comprising: signaling, ina descriptor for a program, a set of profile, tier, level (PTL)information, wherein the PTL information includes profile, tier, andlevel information.

Example 12

The method of example 11, further comprising: signaling, in thedescriptor for the program, a list of operation points that areavailable for the program.

Example 13

The method of any of examples 11 or 12, wherein the descriptor is afirst descriptor, the method further comprising: signaling, in a seconddescriptor for the program, a list of operation points that areavailable for the program.

Example 14

The method of any of examples 11-13, wherein each layer included in oneof the operation points as a layer to be decoded corresponds to an indexthat refers to a set of PTL information from among one or more sets ofPTL information.

Example 15

The method of any of examples 11-14, wherein each layer included in oneof the operation points as an output layer corresponds to an index thatrefers to a set of PTL information from among one or more sets of PTLinformation.

Example 16

A method of processing video data, the method comprising: obtaining,from a descriptor for a program, a set of profile, tier, level (PTL)information, wherein the PTL information includes profile, tier, andlevel information.

Example 17

The method of example 16, further comprising: obtaining, from thedescriptor for the program, a list of operation points that areavailable for the program.

Example 18

The method of any of examples 16 or 17, wherein the descriptor is afirst descriptor, the method further comprising: obtaining, from asecond descriptor for the program, a list of operation points that areavailable for the program.

Example 19

The method of any of examples 16-18, wherein each layer included in oneof the operation points as a layer to be decoded corresponds to an indexthat refers to a set of PTL information from among one or more sets ofPTL information.

Example 20

The method of any of examples 16-19, wherein each layer included in oneof the operation points as an output layer corresponds to an index thatrefers to a set of PTL information from among one or more sets of PTLinformation.

Example 21

A method of processing video data, the method comprising: if adescriptor that carries operation point information is present for aprogram, establishing a High Efficiency Video Coding (HEVC) layer listfor each respective operation point described in the descriptor based oninformation for the respective operation point, the HEVC layer listcontaining layers that are included for the respective operation point;and if the descriptor that carries operation point information is notpresent for the program, each elementary stream with stream type 0x24,0x27 and 0x29 corresponds to a single target operation point.

Example 22

The method of example 21, wherein the descriptor is in a program maptable.

Example 23

A device for processing video data, the device comprising: a memoryconfigured to store the video data, and one or more processors toperform the methods of any of examples 7-28.

Example 24

A device for processing video data, the device comprising means forperforming the methods of any of examples 1-22.

Example 26

A computer-readable data storage medium having instructions storedthereon that when executed cause one or more processors to perform themethods of any of examples 1-22.

In one or more examples, the functions described may be implemented inhardware, software, firmware, or any combination thereof. If implementedin software, the functions may be stored on or transmitted over, as oneor more instructions or code, a computer-readable medium and executed bya hardware-based processing unit. Computer-readable media may includecomputer-readable storage media, which corresponds to a tangible mediumsuch as data storage media, or communication media including any mediumthat facilitates transfer of a computer program from one place toanother, e.g., according to a communication protocol. In this manner,computer-readable media generally may correspond to (1) tangiblecomputer-readable storage media which is non-transitory or (2) acommunication medium such as a signal or carrier wave. Data storagemedia may be any available media that can be accessed by one or morecomputers or one or more processors to retrieve instructions, codeand/or data structures for implementation of the techniques described inthis disclosure. A computer program product may include acomputer-readable medium.

By way of example, and not limitation, such computer-readable storagemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage, or other magnetic storage devices, flashmemory, or any other medium that can be used to store desired programcode in the form of instructions or data structures and that can beaccessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if instructions are transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, digital subscriber line (DSL), orwireless technologies such as infrared, radio, and microwave, then thecoaxial cable, fiber optic cable, twisted pair, DSL, or wirelesstechnologies such as infrared, radio, and microwave are included in thedefinition of medium. It should be understood, however, thatcomputer-readable storage media and data storage media do not includeconnections, carrier waves, signals, or other transient media, but areinstead directed to non-transient, tangible storage media. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and Blu-ray disc, wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one ormore DSPs, general purpose microprocessors, ASICs, FPGAs, or otherequivalent integrated or discrete logic circuitry. Accordingly, the term“processor,” as used herein may refer to any of the foregoing structureor any other structure suitable for implementation of the techniquesdescribed herein. In addition, in some aspects, the functionalitydescribed herein may be provided within dedicated hardware and/orsoftware modules configured for encoding and decoding, or incorporatedin a combined codec. Furthermore, in some aspects, the functionalitydescribed herein may be provided within dedicated hardware and/orsoftware modules configured for processing video data, such as that in aMANE. Also, the techniques could be fully implemented in one or morecircuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless communication device(e.g., a wireless handset), an integrated circuit (IC) or a set of ICs(e.g., a chip set). For instance, a device for processing video data maycomprise an integrated circuit comprising a video decoder configured todecode the encoded video data, a microprocessor comprising a videodecoder configured to decode the encoded video data, a wireless handsetcomprising a video decoder configured to decode the encoded video data,and so on. Various components, modules, or units are described in thisdisclosure to emphasize functional aspects of devices configured toperform the disclosed techniques, but do not necessarily requirerealization by different hardware units. Rather, as described above,various units may be combined in a codec hardware unit or provided by acollection of interoperative hardware units, including one or moreprocessors as described above, in conjunction with suitable softwareand/or firmware.

Various examples have been described. These and other examples arewithin the scope of the following claims.

What is claimed is:
 1. A method of processing video data, the methodcomprising: determining, based on syntax elements in a descriptorcorresponding to a current program element, program elements that needto be accessed and be present in decoding order before decoding thecurrent program element, the descriptor being in a transport stream; anddetermining, based on an indication in the descriptor corresponding tothe current program element, whether the current program elementenhances the frame rate of a bitstream, the bitstream resulting from aset of one or more program elements that need to be accessed and bepresent in decoding order before decoding the current program element.2. The method of claim 1, wherein the indication is part of a syntaxelement that indicates enhancements of the current program element,relative to a base layer.
 3. The method of claim 2, wherein a single bitof the syntax element indicates whether the current program elementenhances the frame rate of the bitstream.
 4. The method of claim 2,wherein all bits of the syntax element being equal to a particular valueindicates the current program element enhances the frame rate of thebitstream.
 5. The method of claim 4, wherein the particular value isequal to
 0. 6. The method of claim 2, wherein the syntax elementconsists of 17 bits and a last bit of the syntax element indicateswhether the current program element enhances the frame rate of thebitstream.
 7. The method of claim 1, wherein the indication consists ofa 1-bit flag separate from a syntax element indicating enhancements ofthe current program element, relative to a base layer.
 8. The method ofclaim 1, wherein each of the program elements corresponds to arespective temporal sub-layer.
 9. A method of processing video data, themethod comprising: determining whether a current program elementenhances a frame rate of a bitstream; including, in a descriptorcorresponding to the current program element, syntax elements indicatinglayer indices of program elements that need to be accessed and bepresent in decoding order before decoding the current program element,the descriptor being in a transport stream; and including, in thedescriptor corresponding to the current program element, an indicationof whether the current program element enhances the frame rate of thebitstream.
 10. The method of claim 9, wherein the indication is part ofa syntax element that indicates enhancements of the current programelement, relative to a base layer.
 11. The method of claim 10, wherein asingle bit of the syntax element indicates whether the current programelement enhances the frame rate of the bitstream.
 12. The method ofclaim 10, wherein all bits of the syntax element being equal to aparticular value indicates the current program element enhances theframe rate of the bitstream.
 13. The method of claim 12, wherein theparticular value is equal to
 0. 14. The method of claim 10, wherein thesyntax element consists of 17 bits and a last bit of the syntax elementindicates whether the current program element enhances the frame rate ofthe bitstream.
 15. The method of claim 9, wherein the indicationconsists of a 1-bit flag separate from a syntax element indicatingenhancements of the current program element, relative to a base layer.16. The method of claim 9, wherein each of the program elementscorresponds to a respective temporal sub-layer.
 17. A device forprocessing video data, the device comprising: one or more data storagemedia configured to store encoded video data; and one or more processorsconfigured to: determine, based on syntax elements in a descriptorcorresponding to a current program element comprising the encoded videodata, program elements that need to be accessed and be present indecoding order before decoding the current program element, thedescriptor being in a transport stream; and determine, based on anindication in the descriptor corresponding to the current programelement, whether the current program element enhances the frame rate ofa bitstream, the bitstream resulting from a set of one or more programelements that need to be accessed and be present in decoding orderbefore decoding the current program element.
 18. The device of claim 17,wherein the indication is part of a syntax element that indicatesenhancements of the current program element, relative to a base layer.19. The device of claim 18, wherein a single bit of the syntax elementindicates whether the current program element enhances the frame rate ofthe bitstream.
 20. The device of claim 18, wherein all bits of thesyntax element being equal to a particular value indicates the currentprogram element enhances the frame rate of the bitstream.
 21. The deviceof claim 20, wherein the particular value is equal to
 0. 22. The deviceof claim 18, wherein the syntax element consists of 17 bits and a lastbit of the syntax element indicates whether the current program elementenhances the frame rate of the bitstream.
 23. The device of claim 17,wherein the indication consists of a 1-bit flag separate from a syntaxelement indicating enhancements of the current program element, relativeto a base layer.
 24. The device of claim 17, wherein each of the programelements corresponds to a respective temporal sub-layer.
 25. The deviceof claim 17, wherein the device comprises at least one of: an integratedcircuit comprising a video decoder configured to decode the encodedvideo data; a microprocessor comprising a video decoder configured todecode the encoded video data; or a wireless handset comprising a videodecoder configured to decode the encoded video data.
 26. The device ofclaim 17, wherein the one or more processors decode the encoded videodata, the device comprising a display configured to display the decodedvideo data.
 27. A device for processing video data, the devicecomprising: a data storage medium configured to store encoded videodata; and one or more processors configured to: determine whether acurrent program element comprising the encoded video data enhances aframe rate of a bitstream; include, in a descriptor corresponding to thecurrent program element, syntax elements indicating layer indices ofprogram elements that need to be accessed and be present in decodingorder before decoding the current program element, the descriptor beingin a transport stream; and include, in the descriptor corresponding tothe current program element, an indication of whether the currentprogram element enhances the frame rate of the bitstream.
 28. The deviceof claim 27, wherein the indication is part of a syntax element thatindicates enhancements of the current program element, relative to abase layer.
 29. The device of claim 28, wherein a single bit of thesyntax element indicates whether the current program element enhancesthe frame rate of the bitstream.
 30. The device of claim 28, wherein allbits of the syntax element being equal to a particular value indicatesthe current program element enhances the frame rate of the bitstream.31. The device of claim 30, wherein the particular value is equal to 0.32. The device of claim 28, wherein the syntax element consists of 17bits and a last bit of the syntax element indicates whether the currentprogram element enhances the frame rate of the bitstream.
 33. The deviceof claim 27, wherein the indication consists of a 1-bit flag separatefrom a syntax element indicating enhancements of the current programelement, relative to a base layer.
 34. The device of claim 27, whereinthe device comprises at least one of: an integrated circuit; amicroprocessor; or a wireless handset.
 35. A device for processing videodata, the device comprising: means for determining, based on syntaxelements in a descriptor corresponding to a current program element,program elements that need to be accessed and be present in decodingorder before decoding the current program element, the descriptor beingin a transport stream; and means for determining, based on an indicationin the descriptor corresponding to the current program element, whetherthe current program element enhances the frame rate of a bitstream, thebitstream resulting from a set of one or more program elements that needto be accessed and be present in decoding order before decoding thecurrent program element.
 36. A device for processing video data, thedevice comprising: means for determining whether a current programelement enhances a frame rate of a bitstream; means for including, in adescriptor corresponding to the current program element, syntax elementsindicating layer indices of program elements that need to be accessedand be present in decoding order before decoding the current programelement, the descriptor being in a transport stream; and means forincluding, in the descriptor corresponding to the current programelement, an indication of whether the current program element enhancesthe frame rate of the bitstream.
 37. A computer-readable data storagemedium having instructions stored thereon that, when executed, cause oneor more processors to: determine, based on syntax elements in adescriptor corresponding to a current program element, program elementsthat need to be accessed and be present in decoding order beforedecoding the current program element, the descriptor being in atransport stream; and determine, based on an indication in thedescriptor corresponding to the current program element, whether thecurrent program element enhances the frame rate of a bitstream, thebitstream resulting from a set of one or more program elements that needto be accessed and be present in decoding order before decoding thecurrent program element.
 38. A computer-readable data storage mediumhaving instructions stored thereon that, when executed, cause one ormore processors to: determine whether a current program element enhancesa frame rate of a bitstream; include, in a descriptor corresponding tothe current program element, syntax elements indicating layer indices ofprogram elements that need to be accessed and be present in decodingorder before decoding the current program element, the descriptor beingin a transport stream; and include, in the descriptor corresponding tothe current program element, an indication of whether the currentprogram element enhances the frame rate of the bitstream.